My online backup provider Crashplan cancelled home users accounts to focus on business users.. Now what?
Spoilers: This is unfinished at time of publishing. I’ll update this post as I progress towards a solution I’m happy with.
There aren’t any obvious Linux compatible unlimited (priced per computer rather than per GB) online backup solutions left, and I didnt fancy running one in a VM or any other jiggery pokery like that.
Cloud providers would be ideal, BackBlaze B2 is a good candidate https://www.backblaze.com/b2/cloud-storage-pricing.html, but Wasabi cheapest I’ve found, https://wasabi.com/pricing/. It still works out for 8TiB over 3 Years with one full restore to cost $1468. I suspect any cheaper providers will change/increase their pricing model or close in a few years and I don’t want to find a new solution every two years, but this is still too expensive for my needs.
Potential Solution: run my own offsite backup system.
Hardware List
$35 Raspberry Pi 3
$16 SD Card
$275 2 * WD RED 8TiB
Total $666
Onsite backup seeding saves normal ~month of initial backup nonsense.
Configuring the Pi
Power
Works on normal usb power as not doing anything too fancy to need the extra watts.
Operating system
Raspian no-gui whatever was latest
Basic WiFi Networking
Wifi setup at my location has public WiFi, that will do
#/etc/network/interfaces.d/wlan0
auto wlan0 iface wlan0 inet dhcp wireless-essid GuestWifiNoCrypto
VPN
My pi is installed in a firewalled location and can’t act as a server that my backup machine can reach.
Simplest VPN: ssh tunnel!
However, using ssh to create a tunnel is very temporary, autossh is a solution that will start ssh on boot and restart it if the connection fails, but it needs some encouragement to have the right / most robust config. The flag that bit me that was missing from other examples is the ExitOnForwardFailure, until I had that the sshvpn was reconnecting, but not establishing the reverse tunnel when an old stale one hadn’t quite cleaned up.
homemachine$ sudo adduser --system pitunnel # First step, create and install a ssh key for the inbound tunnel: pi$ ssh-keygen -t rsa -b 4096 # Install key into my home machine. pi$ ssh-copy-id pitunnel@my.dyndns.address # Test the key, this should connect without a password pi$ ssh pitunnel@my.dyndns.address
Here is my autossh setup.
# /lib/systemd/system/autossh.service [Unit] Description=tunnel After=network.target Wants=network-online.target StartLimitIntervalSec=0 [Service] User=pi Restart=always RestartSec=5 Environment=AUTOSSH_GATETIME=0 ExecStart=/usr/bin/autossh -M 0 -N -q -o "ServerAliveInterval 60" -o "ServerAliveCountMax 3" -o "ExitOnForwardFailure yes" -p 22 -l pitunnel my.dyndns.address -R 2200:127.0.0.1:22 -i /home/pi/.ssh/id_rsa [Install] WantedBy=multi-user.target
Now Enable it / Start it and check it is running ok.
$sudo systemctl enable autossh $sudo systemctl start autossh $sudo systemctl status autossh ◠autossh.service - tunnel Loaded: loaded (/lib/systemd/system/autossh.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2017-10-xx 01:01:38 UTC; 0 days ago Main PID: 739 (autossh) CGroup: /system.slice/autossh.service ├─ 739 /usr/lib/autossh/autossh -M 0 -N -q -o ServerAliveInterval 60 -o ServerAliveCountMax 3 -o ExitOnForwardFailure yes -p 22 -l pitunnel my.dyndns.address -R 2200:127.0.0.1:22 -i /home/pi/.ssh/id_ └─2748 /usr/bin/ssh -N -q -o ServerAliveInterval 60 -o ServerAliveCountMax 3 -o ExitOnForwardFailure yes -p 22 -l pitunnel -R 2200:127.0.0.1:22 -i /home/pi/.ssh/id_rsa my.dyndns.address
Set up convenience ssh host target, so you dont need all the usernames and ports in every ssh and scp you want to use:
# /root/.ssh/config Host pitunnel Hostname 127.0.0.1 User borg port 2200 IdentityFile ~/.ssh/borg
Test it with
ssh pitunnel
It should connect with no password prompt.
Installing the disks
TODO I’ve not really finished this section.
I had initially used a btrfs raid1 mirror instead of a linux mdadm raid, but one of the drives in my dock had connectivity issues.
- Running a btrfs balance took far far too long (days). I have no idea why btrfs takes so long to rebalance, it is not IO speed limited. I’d like someone to benchmark btrfs rebalance in ram and see how slow it still is.
- It would crash when trying to mount after a bad reset, needing to be mounted from my PC with more ram / different kernel / whatever else, in order to clean up the filesystem state. This happened more than once.
- It would force the filesystem read-only after one degraded mount after I took a disk out, this is supposedly fixed in Linux 4.14: http://lkml.iu.edu/hypermail/linux/kernel/1709.1/00609.html
I refuse to use a non check-summing filesystem, and zfs is out of the question for me right now (not open for discussion here, may work for you), so I’m sticking with btrfs, but ignoring the raid modes.
$EDITOR /etc/fstab
/dev/disk/by-label/borgbackup /backup btrfs defaults,noatime,nofail 0 2
linux raid jiggery, I’ve done this a squillion times, but havent actually set it up yet.
$ sudo apt install smartmontools
TODO: You should probably configure a mail relay so emailing from the pi works.
watchdog
The systemd watchdog looks an interesting framework, but didn’t have an immediately obvious way of checking net connection was ok, so I opted for the lazy/easy watchdog package..
sudo apt install watchdog $EDITOR /etc/watchdog.conf # set the following lines ping = 8.8.8.8 # google dns interface = wlan0 # check wifi interface watchdog-timeout = 10 # largest pi watchdog value, default of 60 errors
Misc Config
I had the Pi plugged into a spare HDMI port on a nearby monitor to be able to check in on it from time to time and plugged into a usb switcher to swap keyboard/mouse over.
aliexpress $6 USB KVM Switcher was the cheapest I found, and I just ignore the vga ports
Annoyingly when my machine was locked, the Pi would occasionally reboot or steal input via HDMI to the monitor..
#/boot/config.txt hdmi_blanking=1
This setting is supposed to fix it, at the cost of breaking media player software horribly.. which is ok, as I dont use it on this. I’ll update my experience.
Backup software
restic
I first tried with restic, it ticked most of my boxes, golang, encrypted, authenticated, de-duplicating, incremental. I however had a few problems, restic seems to have a few problems with pruning over slow links, and it ran out of memory when I tried to prune from the pi itself. It looks like it’s got scope to grow into a solid backup tool, but doesn’t quite meet my low memory / high latency needs at the moment.
Issues I ran into:
No snapshot usage stats https://github.com/restic/restic/issues/693
Random Crash: https://github.com/restic/restic/issues/1251
Poor errors with SFTP backend: https://github.com/restic/restic/issues/1323
Very little progress output on prune: https://github.com/restic/restic/issues/1322
No network stats https://github.com/restic/restic/issues/1033
The restic author is maintaining a detailed feature list of alternative backup software, which I can highly recommend for those exploring backup systems
The most mature alternative candidate seems to be borgbackup. I didn’t really need compression that borgbackup adds, and I didn’t want the overhead that python has, but it seems to tick most boxes.
borgbackup
I used the borgbackup 1.1.1 release from pip, as it fixes some issues I had with clearing out stale locks (after the watchdog resets the pi) and raspian hasn’t updated yet..
Version 1.1.0 managed to crash my host machine due to a bug with it opening every block device, and possibly a secondary bug in my kernel, this needs further investigation for the kernel side – https://github.com/borgbackup/borg/issues/3213
$ sudo pip3 install borgbackup
I dropped the script from the borg backup Quick Start – Automating Backups into /etc/cron.daily/borgbackup I added the -x flag for one filesystem and then added the paths to the mounts I wanted
export BORG_REPO=ssh://pitunnel/~/repo
Oops I slightly corrupted my repo with the watchdog resets and lock fudgery, the prune was crashing with a missing directory.
Python single threaded, cpu bound, multiple day long check…awarghgjbwakwffkawkfkafhgh, it’s still not finished, I have little idea about its progress (it’s spent all the time Analyzing archive “hostname-2017-10-18T11:44:25.checkpoint (3/19)”, and I have 15 idle cores.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND | 22288 root 20 0 1000796 897456 6984 R 100.0 5.5 3844:24 /usr/bin/python3 /usr/bin/borg check --verbose --repair
Day 7: This breached 10000 minutes 7 days of runtime so I’ve shut it down..
Day 14: I forgot to check up on it for a while, miraculously its been running for a week making backups every day! Hooray
But the repo isnt “healthy”..
$ borg delete $BORG_REPO::darkskiez-2017-10-23T23:05:06.checkpoint borg.repository.Repository.ObjectNotFound: Object with key fe00c10f6ce4b733d013ee30a60f47284368e4fa47164cb0d8af507dc0acedf0 not found in repository ssh://pitunnel/~/reopo::darkskiez-2017-10-23T23:05:06.checkpoint.
Sod it, I’ll force it and nuke the lot of them.
borg list --short|grep checkpoint|xargs -i borg delete --force $BORG_REPO::{} forced deletion succeeded, but the deleted archive was corrupted. borg check --repair is required to free all space.
Argh! borg list still shows a checkpoint file… and borg delete –force is erroring with the same not found error, how hard is it to delete something that isn’t there?!!?
We know how well borg check worked last time, but I really didn’t want it to repair these checkpoints before, maybe now it will work I don’t have high hopes
using builtin fallback logging configuration 35 self tests completed in 0.08 seconds SSH command line: ['ssh', 'pitunnel', 'borg', 'serve', '--umask=077', '--debug'] Remote: using builtin fallback logging configuration Remote: 35 self tests completed in 0.72 seconds Remote: using builtin fallback logging configuration Remote: Initialized logging system for JSON-based protocol Remote: Resolving repository path b'/~/repo' Remote: Resolved repository path to '/backup/repo' 'check --repair' is an experimental feature that might result in data loss. Type 'YES' if you understand this and want to continue: YES Remote: Starting repository check Remote: Verified integrity of /backup/repo/index.179252 Remote: Read committed index of transaction 179252 Remote: Cleaned up 0 uncommitted segment files (== everything after segment 179252). Remote: Segment transaction is 179252 Remote: Determined transaction is 179252 Remote: Found 166446 segments Remote: Checking segments 100% Remote: Completed repository check, no problems found. Starting archive consistency check... Remote: Verified integrity of /backup/repo/index.179253 TAM-verified manifest Analyzing archive darkskiez-2017-10-12T11:01:00 (1/9) Remote: Cleaned up 0 uncommitted segment files (== everything after segment 179253). Remote: Verified integrity of /backup/repo/hints.179253 Analyzing archive darkskiez-2017-10-18T10:22:54.checkpoint (2/9) Archive metadata block is missing! Analyzing archive darkskiez-2017-10-24T10:02:14 (3/9) /srv/photos/IMG_2445.JPG: New missing file chunk detected (Byte 0-888561). Replacing with all-zero chunk. /srv/photos/IMG_2445.JPG: New missing file chunk detected (Byte 888561-1153253). Replacing with all-zero chunk. item metadata chunk missing [chunk: 000528_fe00c10f6ce4b733d013ee30a60f47284368e4fa47164cb0d8af507dc0acedf0]
It got here before in the 7 day run, so I assume it doesn’t do the fixing so well.
I’m going to delete this specific archive.
Day 21: For various reasons this check couldnt finish either, still takes too long..
I am very sad. I am resisting the strong urge to write my own backup system. Perhaps I should retry restic but with much smaller repos. Perhaps I should consider trying another system… knoxite looks better than restic on the box, but has nearly zero development activity / users. I refused to believe it is mature with this evidence, but then again, I was wholly wrong about borg backup it seems.
btrfs snapshots
Ok, maybe all this backup software is too fancy.. Maybe btrfs is fancy enough a filesystem that I don’t need the extra things..
There are a few btrfs backup scripts, based on the principals in: https://btrfs.wiki.kernel.org/index.php/Incremental_Backup
So I have quite a lot of data I wish to “exclude” from my backup, and reorganizing my filesystem to have different (sub)volumes doesn’t sound like fun, especially when we are talking about lots of ‘cache’ dirs and what not.
I wonder if btrfs snapshot diffing is intelligent enough that if I snapshot fs A into S1 and S2 in different times, prune the files I don’t want in S1 and S2, and do a btrfs send of the deltas S1-S2 if it will work at all, and be intelligent enough to not include the deleted files.
Crude test time.
btrfs subvol create /home/snaptest # Create a bigfile1 /home/snaptest/b1 btrfs subvolume snapshot /home/snaptest /home/snaptest-s1 # Create a bigfile2 /home/snaptest/b2 # Create a small file /home/snaptest/s btrfs subvolume snapshot /home/snaptest /home/snaptest-s2 # Delete b2 from /home/snaptest-s1/b2 and /home/snaptest-s2/b2 btrfs send -p /home/snaptest-s1 /home/snaptest-s2|wc -c ERROR: subvolume /home/snaptest-s1 is not read-only # Woops btrfs property set -ts /home/snaptest-s1 ro true btrfs property set -ts /home/snaptest-s2 ro true btrfs send -p /home/snaptest-s1 /home/snaptest-s2|wc -c # Wahoo: only a bit bigger than smallfile btrfs sub del /home/snaptest*
There may be some hope here yet.
If I encrypt the data that is being sent with ‘btrfs send’ that is secure, but, very awkward to remotely use in an emergency when I cant restore the snaphots..
Perhaps layering it on something like https://nuetzlich.net/gocryptfs/ will do the job. If I really need to I can use that to mount the encrypted snapshots remotely.
My concerns mostly revolve around how to recover from partial data loss. If my btrfs base snapshot becomes corrupt, is there a way to recover data from a serialized snapshot diff? I’d like a btrfs receive –ignore-inconsistencies option, so I could apply a partial snapshot diff to a blank snapshot and get any new files that were included in the delta.