Backup Pi

My online backup provider Crashplan cancelled home users accounts to focus on business users.. Now what?

Spoilers: This is unfinished at time of publishing. I’ll update this post as I progress towards a solution I’m happy with.

There aren’t any obvious Linux compatible unlimited (priced per computer rather than per GB) online backup solutions left, and I didnt fancy running one in a VM or any other jiggery pokery like that.

Cloud providers would be ideal, BackBlaze B2 is a good candidate https://www.backblaze.com/b2/cloud-storage-pricing.html, but Wasabi cheapest I’ve found, https://wasabi.com/pricing/. It still works out for 8TiB over 3 Years with one full restore to cost $1468. I suspect any cheaper providers will change/increase their pricing model or close in a few years and I don’t want to find a new solution every two years, but this is still too expensive for my needs.

Potential Solution: run my own offsite backup system.

Hardware List

$35 Raspberry Pi 3

$16 SD Card

$30 USB Dual HDD Dock

$275 2 * WD RED 8TiB

Total $666

Onsite backup seeding saves normal ~month of initial backup nonsense.

Configuring the Pi

Power

Works on normal usb power as not doing anything too fancy to need the extra watts.

Operating system

Raspian no-gui whatever was latest

Basic WiFi Networking

Wifi setup at my location has public WiFi, that will do

#/etc/network/interfaces.d/wlan0
auto wlan0
iface wlan0 inet dhcp
wireless-essid GuestWifiNoCrypto

VPN

My pi is installed in a firewalled location and can’t act as a server that my backup machine can reach.

Simplest VPN: ssh tunnel!

However, using ssh to create a tunnel is very temporary, autossh is a solution that will start ssh on boot and restart it if the connection fails, but it needs some encouragement to have the right / most robust config. The flag that bit me that was missing from other examples is the ExitOnForwardFailure, until I had that the sshvpn was reconnecting, but not establishing the reverse tunnel when an old stale one hadn’t quite cleaned up.

homemachine$ sudo adduser --system pitunnel
# First step, create and install a ssh key for the inbound tunnel:
pi$ ssh-keygen -t rsa -b 4096
# Install key into my home machine.
pi$ ssh-copy-id pitunnel@my.dyndns.address
# Test the key, this should connect without a password
pi$ ssh pitunnel@my.dyndns.address

Here is my autossh setup.

# /lib/systemd/system/autossh.service 
[Unit]
Description=tunnel
After=network.target
Wants=network-online.target
StartLimitIntervalSec=0

[Service]
User=pi
Restart=always
RestartSec=5
Environment=AUTOSSH_GATETIME=0
ExecStart=/usr/bin/autossh -M 0 -N -q -o "ServerAliveInterval 60" -o "ServerAliveCountMax 3" -o "ExitOnForwardFailure yes" -p 22 -l pitunnel my.dyndns.address -R 2200:127.0.0.1:22 -i /home/pi/.ssh/id_rsa

[Install]
WantedBy=multi-user.target

Now Enable it / Start it and check it is running ok.

$sudo systemctl enable autossh
$sudo systemctl start autossh
$sudo systemctl status autossh
● autossh.service - tunnel
   Loaded: loaded (/lib/systemd/system/autossh.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2017-10-xx 01:01:38 UTC; 0 days ago
 Main PID: 739 (autossh)
   CGroup: /system.slice/autossh.service
           ├─ 739 /usr/lib/autossh/autossh -M 0 -N -q -o ServerAliveInterval 60 -o ServerAliveCountMax 3 -o ExitOnForwardFailure yes -p 22 -l pitunnel my.dyndns.address -R 2200:127.0.0.1:22 -i /home/pi/.ssh/id_
           └─2748 /usr/bin/ssh -N -q -o ServerAliveInterval 60 -o ServerAliveCountMax 3 -o ExitOnForwardFailure yes -p 22 -l pitunnel -R 2200:127.0.0.1:22 -i /home/pi/.ssh/id_rsa my.dyndns.address

Set up convenience ssh host target, so you dont need all the usernames and ports in every ssh and scp you want to use:

# /root/.ssh/config 
Host pitunnel
    Hostname 127.0.0.1
    User borg
    port 2200
    IdentityFile ~/.ssh/borg

Test it with

ssh pitunnel

It should connect with no password prompt.

Installing the disks

TODO I’ve not really finished this section.

I had initially used a btrfs raid1 mirror instead of a linux mdadm raid, but one of the drives in my dock had connectivity issues.

  • Running a btrfs balance took far far too long (days). I have no idea why btrfs takes so long to rebalance, it is not IO speed limited. I’d like someone to benchmark btrfs rebalance in ram and see how slow it still is.
  • It would crash when trying to mount after a bad reset, needing to be mounted from my PC with more ram / different kernel / whatever else, in order to clean up the filesystem state. This happened more than once.
  • It would force the filesystem read-only after one degraded mount after I took a disk out, this is supposedly fixed in Linux 4.14: http://lkml.iu.edu/hypermail/linux/kernel/1709.1/00609.html

I refuse to use a non check-summing filesystem, and zfs is out of the question for me right now (not open for discussion here, may work for you), so I’m sticking with btrfs, but ignoring the raid modes.

$EDITOR /etc/fstab

/dev/disk/by-label/borgbackup /backup btrfs defaults,noatime,nofail 0 2

linux raid jiggery, I’ve done this a squillion times, but havent actually set it up yet.

$ sudo apt install smartmontools

TODO: You should probably configure a mail relay so emailing from the pi works.

watchdog

The systemd watchdog looks an interesting framework, but didn’t have an immediately obvious way of checking net connection was ok, so I opted for the lazy/easy watchdog package..

sudo apt install watchdog

$EDITOR /etc/watchdog.conf
# set the following lines
ping = 8.8.8.8 # google dns
interface = wlan0 # check wifi interface
watchdog-timeout = 10 # largest pi watchdog value, default  of 60 errors

Misc Config

I had the Pi plugged into a spare HDMI port on a nearby monitor to be able to check in on it from time to time and plugged into a usb switcher to swap keyboard/mouse over.

aliexpress $6 USB KVM Switcher was the cheapest I found, and I just ignore the vga ports

Annoyingly when my machine was locked, the Pi would occasionally reboot or steal input via HDMI to the monitor..

#/boot/config.txt
hdmi_blanking=1

This setting is supposed to fix it, at the cost of breaking media player software horribly.. which is ok, as I dont use it on this. I’ll update my experience.

Backup software

restic

I first tried with restic, it ticked most of my boxes, golang, encrypted, authenticated, de-duplicating, incremental. I however had a few problems, restic seems to have a few problems with pruning over slow links, and it ran out of memory when I tried to prune from the pi itself. It looks like it’s got scope to grow into a solid backup tool, but doesn’t quite meet my low memory / high latency needs at the moment.

Issues I ran into:

No snapshot usage stats https://github.com/restic/restic/issues/693

Random Crash: https://github.com/restic/restic/issues/1251

Poor errors with SFTP backend: https://github.com/restic/restic/issues/1323

Very little progress output on prune: https://github.com/restic/restic/issues/1322

No network stats https://github.com/restic/restic/issues/1033

The restic author is maintaining a detailed feature list of alternative backup software, which I can highly recommend for those exploring backup systems

The most mature alternative candidate seems to be borgbackup. I didn’t really need compression that borgbackup adds, and I didn’t want the overhead that python has, but it seems to tick most boxes.

borgbackup

I used the borgbackup 1.1.1 release from pip, as it fixes some issues I had with clearing out stale locks (after the watchdog resets the pi) and raspian hasn’t updated yet..
Version 1.1.0 managed to crash my host machine due to a bug with it opening every block device, and possibly a secondary bug in my kernel, this needs further investigation for the kernel side – https://github.com/borgbackup/borg/issues/3213

$ sudo pip3 install borgbackup

I dropped the script from the borg backup Quick Start – Automating Backups into /etc/cron.daily/borgbackup I added the -x flag for one filesystem and then added the paths to the mounts I wanted

 export BORG_REPO=ssh://pitunnel/~/repo

Oops I slightly corrupted my repo with the watchdog resets and lock fudgery, the prune was crashing with a missing directory.

Python single threaded, cpu bound, multiple day long check…awarghgjbwakwffkawkfkafhgh, it’s still not finished, I have little idea about its progress (it’s spent all the time Analyzing archive “hostname-2017-10-18T11:44:25.checkpoint (3/19)”, and I have 15 idle cores.

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                |
22288 root      20   0 1000796 897456   6984 R 100.0  5.5   3844:24 /usr/bin/python3 /usr/bin/borg check --verbose --repair               

Day 7: This breached 10000 minutes 7 days of runtime so I’ve shut it down..

Day 14: I forgot to check up on it for a while, miraculously its been running for a week making backups every day! Hooray

But the repo isnt “healthy”..

$ borg delete $BORG_REPO::darkskiez-2017-10-23T23:05:06.checkpoint
borg.repository.Repository.ObjectNotFound: Object with key fe00c10f6ce4b733d013ee30a60f47284368e4fa47164cb0d8af507dc0acedf0 not found in repository ssh://pitunnel/~/reopo::darkskiez-2017-10-23T23:05:06.checkpoint.

Sod it, I’ll force it and nuke the lot of them.

borg list --short|grep checkpoint|xargs -i borg delete --force $BORG_REPO::{}
forced deletion succeeded, but the deleted archive was corrupted.
borg check --repair is required to free all space. 

Argh! borg list still shows a checkpoint file… and borg delete –force is erroring with the same not found error, how hard is it to delete something that isn’t there?!!?

We know how well borg check worked last time, but I really didn’t want it to repair these checkpoints before, maybe now it will work 😐 I don’t have high hopes

using builtin fallback logging configuration
35 self tests completed in 0.08 seconds
SSH command line: ['ssh', 'pitunnel', 'borg', 'serve', '--umask=077', '--debug']
Remote: using builtin fallback logging configuration
Remote: 35 self tests completed in 0.72 seconds
Remote: using builtin fallback logging configuration
Remote: Initialized logging system for JSON-based protocol
Remote: Resolving repository path b'/~/repo'
Remote: Resolved repository path to '/backup/repo'
'check --repair' is an experimental feature that might result in data loss.
Type 'YES' if you understand this and want to continue: YES
Remote: Starting repository check
Remote: Verified integrity of /backup/repo/index.179252
Remote: Read committed index of transaction 179252
Remote: Cleaned up 0 uncommitted segment files (== everything after segment 179252).
Remote: Segment transaction is    179252
Remote: Determined transaction is 179252
Remote: Found 166446 segments
Remote: Checking segments 100%
Remote: Completed repository check, no problems found.
Starting archive consistency check...
Remote: Verified integrity of /backup/repo/index.179253
TAM-verified manifest
Analyzing archive darkskiez-2017-10-12T11:01:00 (1/9)
Remote: Cleaned up 0 uncommitted segment files (== everything after segment 179253).
Remote: Verified integrity of /backup/repo/hints.179253
Analyzing archive darkskiez-2017-10-18T10:22:54.checkpoint (2/9)
Archive metadata block is missing!
Analyzing archive darkskiez-2017-10-24T10:02:14 (3/9)
/srv/photos/IMG_2445.JPG: New missing file chunk detected (Byte 0-888561). Replacing with all-zero chunk.
/srv/photos/IMG_2445.JPG: New missing file chunk detected (Byte 888561-1153253). Replacing with all-zero chunk.
item metadata chunk missing [chunk: 000528_fe00c10f6ce4b733d013ee30a60f47284368e4fa47164cb0d8af507dc0acedf0]

It got here before in the 7 day run, so I assume it doesn’t do the fixing so well.

I’m going to delete this specific archive.

Day 21: For various reasons this check couldnt finish either, still takes too long..

I am very sad. I am resisting the strong urge to write my own backup system. Perhaps I should retry restic but with much smaller repos. Perhaps I should consider trying another system… knoxite looks better than restic on the box, but has nearly zero development activity / users. I refused to believe it is mature with this evidence, but then again, I was wholly wrong about borg backup it seems.

btrfs snapshots

Ok, maybe all this backup software is too fancy.. Maybe btrfs is fancy enough a filesystem that I don’t need the extra things..

There are a few btrfs backup scripts, based on the principals in: https://btrfs.wiki.kernel.org/index.php/Incremental_Backup

So I have quite a lot of data I wish to “exclude” from my backup, and reorganizing my filesystem to have different (sub)volumes doesn’t sound like fun, especially when we are talking about lots of ‘cache’ dirs and what not.

I wonder if btrfs snapshot diffing is intelligent enough that if I snapshot fs A into S1 and S2 in different times, prune the files I don’t want in S1 and S2, and do a btrfs send of the deltas S1-S2 if it will work at all, and be intelligent enough to not include the deleted files.

Crude test time.

btrfs subvol create /home/snaptest
# Create a bigfile1 /home/snaptest/b1
btrfs subvolume snapshot /home/snaptest /home/snaptest-s1
# Create a bigfile2 /home/snaptest/b2
# Create a small file /home/snaptest/s
btrfs subvolume snapshot /home/snaptest /home/snaptest-s2
# Delete b2 from  /home/snaptest-s1/b2 and /home/snaptest-s2/b2

btrfs send -p /home/snaptest-s1 /home/snaptest-s2|wc -c
ERROR: subvolume /home/snaptest-s1 is not read-only
# Woops
btrfs property set -ts /home/snaptest-s1 ro true
btrfs property set -ts /home/snaptest-s2 ro true

btrfs send -p /home/snaptest-s1 /home/snaptest-s2|wc -c
# Wahoo: only a bit bigger than smallfile
btrfs sub del /home/snaptest*

There may be some hope here yet.

If I encrypt the data that is being sent with ‘btrfs send’ that is secure, but, very awkward to remotely use in an emergency when I cant restore the snaphots..
Perhaps layering it on something like https://nuetzlich.net/gocryptfs/ will do the job. If I really need to I can use that to mount the encrypted snapshots remotely.

My concerns mostly revolve around how to recover from partial data loss. If my btrfs base snapshot becomes corrupt, is there a way to recover data from a serialized snapshot diff? I’d like a btrfs receive –ignore-inconsistencies option, so I could apply a partial snapshot diff to a blank snapshot and get any new files that were included in the delta.