Hey all,
I am currently researching methods for quickly backing up my home directory.
So far I have been using DejaDup and it kind of worked out for me when my laptop got stolen recently.
The problem that I see with DejaDup's and duplicity's file system-agnostic approach is that it cannot know which files change.
My home directory contains (among others) my photo collection which takes up around 300GB of space and since I keep adding tags and whatnot the files actually marginally change over time.
So with duplicity/dejadup a backup took long enough to keep me from doing regular backups since every picture had to be scanned.
I am at a point where I would be ok with "rolling my own" backup scripts to accomplish what I need but of course if anybody has pointers to projects that do what I want...
So maybe I should start with what I want from a backup:
- Reliable: I can actually restore the data
- Secure: I keep my hard drive encrypted and I see no reason to have an unencrypted copy of it lying around
- Consistent: On the off-chance that I perform major changes to my home directory while the backup runs I don't want to get an inconsistent view of it in the backup
- Multiple plans: A plan for me consists of
- source (which folders to backup)
- target (internal/external drive, ftp, rsync, ...)
- schedule (daily, weekly, ...)
- strategy for keeping/ditching old backups (e.g. keep the newest one, one from a week ago, one from a month ago and one from a year ago
- Fast: I do not want the backup to take longer than 5 minutes if i didn't change a lot. Of course, if I decide to reorder my photo collection, I must live with a long backup for the next increment.
So far I could find solutions that did everything except the "fast" part:
In particular, bacula seems to be very powerful, even more than I actually need probably.
Rsync/Tar/Cron/pgp seems to be capable of doing everything I need, too.
I still really like duplicity, especially with the Horcrux wrapper it seems to be pretty versatile.
But none of these tools seem to be making use of snapshot diffs as btrfs offers them
I am running an LVM so I could use LVM snapshots but I cannot find a way of quickly determining the 'candidate list' of changed files.
In general, the use for snapshots in the backup process seems to be that they provide a consistent view of the file system while the backup runs, not so much for performance gain.
Now the idea would be to use a file system that supports taking snapshots and comparing them to each other.
Then I would:
- Take an initial file system snapshot S1.
- Every time I want to do an incremental backup:
- create a new snapshot S2
- use some file system voodoo to get a list of changes or at least of the changed files between S1 and S2
- use my backup script/application to only perform a backup of those files since no other files changed between the snapshots
- reassign S2 to be the new 'current state', i.e. S1 <-- S2
Does anybody know of tools to do that or something like that?
Or at least pointers to the commands I could use?
Which file systems support snapshot diffs? Currently I am running ext4. Changing to btrfs would take a night of work (for my laptop, not for me that is..) but would be possible.
Regarding duplicity, is there any way to pass in a candidate list of which files might have changed?
I see includes/excludes, but I am afraid that those have different semantics, i.e. the file list saved with the backup will change.
What I am looking for is mor like "assume all files to be unchanged except thos in the file I pass you as an argument."
Thanks all for your time.
Bookmarks