PDA

View Full Version : Howto: Back up your entire system using TAR



newb85
August 28th, 2012, 12:16 PM
Preface
The overall concept for this tutorial came from the howto by Heliode. My reasons for creating a new howto are several. His tutorial was created seven years ago, when Hoary was the latest release. He hasn't been active on the forums in nearly five years. He leaves out critical steps, resulting in a method that is not reliable. Moreover, that thread has nearly 1400 replies, many of which have nothing to do with using tar to back up a system.
I give this tutorial two objectives: to familiarize you with the tar command, and then to expound on approaches to backing up your system using tar. Please note that using tar to backup an entire system is not for the faint of heart, nor is it something to be taken lightly. The behavior of tar is not always intuitive, and wrong assumptions can lead to the loss of data.
For that reason several important nuances of tar are demonstrated within the examples. These examples are for educational purposes, and I expect it would be beneficial for the average reader to run them. They only affect a sample directory within the /home directory, so if they're run correctly, they shouldn't affect the rest of the system.
However, I will not provide lines of code that I expect readers to copy-and-paste to back up their entire system. My examples and suggestions will have to be tailored to meet your specific situation and needs.
I start with the assumption that the reader is familiar and a somewhat comfortable with executing commands in the terminal. To those who are not, I recommend reading the documentation here. https://help.ubuntu.com/community/UsingTheTerminal

Introduction to TAR
tar is a simple yet very powerful archiving tool. It stands for Tape ARchiver, which should tell you how long it's been around. (Tape drives fell into disuse some while ago...) If you've ever downloaded a file you had to extract (or “unzip”, to use Windows terminology) in Ubuntu, you probably used Archive Manager, which is built on tar.
When the tar command is issued, it requires one “option” to set the operation mode. There are nine operation modes, but I will only cover a three of them. All options of tar are in the man page, which can be viewed by issuing the command man tar.

Create (create an archive)
Function Letter: c
Long handle: --create

Extract (extract or unpack an archive)
Function Letter: x
Long handles: --extract --get

List (spits out a list of all the contents of an archive)
Function Letter: t
Long handle: --list

For demonstration purposes, I create a folder tartest with a few sample documents.


aaron@aaron-Satellite-P755:~$ mkdir Documents/tartest
aaron@aaron-Satellite-P755:~$ cd tartest
aaron@aaron-Satellite-P755:~/Documents/tartest$ touch acerbic.odt biting.odp \
> scathing.txt tangy.ods
aaron@aaron-Satellite-P755:~/Documents/tartest$ mkdir piquant
aaron@aaron-Satellite-P755:~/Documents/tartest$ touch piquant/acidulous \
> piquant/trenchant
aaron@aaron-Satellite-P755:~/Documents/tartest$ ls
acerbic.odt biting.odp piquant scathing.txt tangy.ods
aaron@aaron-Satellite-P755:~/Documents/tartest$ ls piquant
acidulous trenchant
My goal is to back tartest up in an archive within a directory backup, but I don't want the archive to include the archive, which will be contained within tartest.

Exclude (exclude a file matching PATTERN from the archive)
Long handle: --exclude PATTERN

File (designate the archive file—yeah, this one's important)
Function letter: f
Long handle: --file

Preserve permissions (leave dates, ownership, and permissions unchanged)
Function letter: p
Long handles: --preserve-permissions --same-permissions

In the following steps, I'll create the directory backup, then create an archive of everything in tartest except for backup, delete the file scathing.txt, then restore tartest from the archive (thereby restoring scathing.txt).


aaron@aaron-Satellite-P755:~/Documents/tartest$ mkdir backup
aaron@aaron-Satellite-P755:~/Documents/tartest$ tar cpf backup/backup1.tar \
> --exclude backup .
aaron@aaron-Satellite-P755:~/Documents/tartest$ tar tf backup/backup1.tar
./
./tangy.ods
./piquant/
./piquant/trenchant
./piquant/acidulous
./acerbic.odt
./scathing.txt
./biting.odp
aaron@aaron-Satellite-P755:~/Documents/tartest$ rm scathing.txt
aaron@aaron-Satellite-P755:~/Documents/tartest$ ls
acerbic.odt backup biting.odp piquant tangy.ods
aaron@aaron-Satellite-P755:~/Documents/tartest$ tar xpf backup/backup1.tar
aaron@aaron-Satellite-P755:~/Documents/tartest$ ls
acerbic.odt backup biting.odp piquant scathing.txt tangy.ods
Success! Everything worked as expected. Now I delete tangy.ods and create caustic.odt, followed by another restore of tartest.
aaron@aaron-Satellite-P755:~/Documents/tartest$ rm tangy.ods
aaron@aaron-Satellite-P755:~/Documents/tartest$ touch caustic.odt
aaron@aaron-Satellite-P755:~/Documents/tartest$ ls
acerbic.odt backup biting.odp caustic.odt piquant scathing.txt
aaron@aaron-Satellite-P755:~/Documents/tartest$ tar xpf backup/backup1.tar
aaron@aaron-Satellite-P755:~/Documents/tartest$ ls
acerbic.odt backup biting.odp caustic.odt piquant scathing.txt tangy.ods Notice that after the restore, caustic.odt persists. By default, tar doesn't do any comparing. In extract mode, it simply creates every file represented within the archive. An effective backup and restore method needs to keep track of both what should be there and what should not be there.

Listed Backups

Listed Incremental (handle incremental backups based on list FILE)
Function Letter: g
Long Handle: --listed-incremental FILE

I won't address incremental backups yet, but for the moment the important thing to note is that this option changes the behavior of tar so that it uses a list file for comparison, and in extract mode removes the files that shouldn't be there. Observe.


aaron@aaron-Satellite-P755:~/Documents/tartest$ tar cpf backup/backup2.tar \
> -g backup/backup2.snar --exclude backup .
aaron@aaron-Satellite-P755:~/Documents/tartest$ tar tf backup/backup2.tar
./
./piquant/
./acerbic.odt
./biting.odp
./caustic.odt
./scathing.txt
./tangy.ods
./piquant/acidulous
./piquant/trenchant
aaron@aaron-Satellite-P755:~/Documents/tartest$ rm biting.odp
aaron@aaron-Satellite-P755:~/Documents/tartest$ touch snappish.odp
aaron@aaron-Satellite-P755:~/Documents/tartest$ ls
acerbic.odt caustic.odt scathing.txt tangy.ods
backup piquant snappish.odp
aaron@aaron-Satellite-P755:~/Documents/tartest$ tar xpf backup/backup2.tar \
> -g backup/backup2.snar
aaron@aaron-Satellite-P755:~/Documents/tartest$ ls
acerbic.odt backup biting.odp caustic.odt piquant scathing.txt tangy.ods

Multiple Exclusions
There's another limitation of tar that I've found to be very poorly documented. tar is not very good at accepting multiple --exclude options. I haven't found documentation giving a limit to how many it will accept, and in my experience, the number it accepts is inconsistent. The result of giving too many is that the last --exclude options are ignored with no warnings. When excluding multiple files or directories, I recommend using one or both of the following alternatives.

Exclude-from (excludes files listed in FILE from the archive)
Function letter: X (not to be confused with x)
Long handle: --exclude-from FILE

Exclude-tag-all (excludes all directories containing FILE from the archive)
Long handle: --exclude-tag-all FILE

To demonstrate the Exclude-from option, I've added the file exclusions.txt in the backup directory to exclude the directory piquant.

To demonstrate the Exclude-tag-all option I will add the file backup.excl.tag to the directory backup.


aaron@aaron-Satellite-P755:~/Documents/tartest$ touch backup/backup.excl.tag
aaron@aaron-Satellite-P755:~/Documents/tartest$ tar cpf backup/backup3.tar \
> -g backup/backup3.snar -X backup/exclusions.txt \
> --exclude-tag-all backup.excl.tag .
aaron@aaron-Satellite-P755:~/Documents/tartest$ tar tf backup/backup3.tar
./
./acerbic.odt
./biting.odp
./caustic.odt
./scathing.txt
./tangy.odsBoth exclude-from and exclude-tag-all have their merits and drawbacks. Exclude-tag-all requires that a tag file be placed in each folder to be excluded. When backing up an entire system, some system directories need to be excluded (to be discussed later), and it's not a good idea to add a tag file in those directories. For those, exclude-from is invaluable. However, there may be directories that one only wants to exclude sometimes. In those cases, those folders may be tagged with descriptive tag files that can be used when appropriate (also to be discussed later).

Compression
Now, all this is well and good as far as backing up a small sample directory goes. However, when backing up something the size of an entire system, drive space has potential to be an issue. Fortunately, tar has many built-in compression options. The two most widely used are gzip and bzip2, but one can read about the rest on the man page.

Gzip (use the gzip compression format)
Function letter: g
Long handle: --gzip
(Note that for gzip files, the suffixes .tar.gz and .tgz are both acceptable, but .tgz is more commonly used.)

Bzip (use the bzip2 compression format)
Function letter: j
Long handle: --bzip2
(Note that for bzip files, the suffixes .tar.bz and .tbz are both acceptable, but .tbz is more commonly used.)

Auto-compress (uses the compression format indicated by the file suffix)
Function letter: a
Long handle: --auto-compress

In general, it's a good idea to use auto-compress. tar would be more than happy to create an archive with the wrong suffix (or even no suffix) if you told it to, which could lead to confusion down the road. Auto-compress ensures that the file suffix and the actual compression method match.


aaron@aaron-Satellite-P755:~/Documents/tartest$ tar capf backup/backup4.tgz \
> -g backup/backup4.snar -X backup/exclusions.txt \
> --exclude-tag-all backup.excl.tag .
aaron@aaron-Satellite-P755:~/Documents/tartest$ tar tf backup/backup4.tgz
./
./acerbic.odt
./biting.odp
./caustic.odt
./scathing.txt
./tangy.ods
Incremental Backups
Earlier we started using the listed-incremental option because it created an index so that the exact state of the file system could be restored. However, there are a couple more benefits to this option.
First, creating a backup that is a snapshot of the entire system can tie up resources for a while and take a good chunk of storage space. Much of this time and space can be saved by creating an incremental backup—a backup that only tracks the changes from a previous backup.
In my previous examples creating an archive with -g, the index files were files that did not yet exist. However, if the index file already exists tar's behavior changes. It starts by comparing the index file with the system, and any files that have been added or modified in the system are placed in a new archive. Then, the index file is updated to reflect the current state of the system.


aaron@aaron-Satellite-P755:~/Documents/tartest$ touch pungent.odt
aaron@aaron-Satellite-P755:~/Documents/tartest$ cp backup/backup4.snar \
> backup/backup5.snar
aaron@aaron-Satellite-P755:~/Documents/tartest$ tar capf backup/backup5.tgz \
> -g backup/backup5.snar -X backup/exclusions.txt \
> --exclude-tag-all backup.excl.tag .
aaron@aaron-Satellite-P755:~/Documents/tartest$ tar tf backup/backup5.tgz
./
./pungent.odt
aaron@aaron-Satellite-P755:~/Documents/tartest$ rm pungent.odt biting.odp
aaron@aaron-Satellite-P755:~/Documents/tartest$ touch acetose.ods
aaron@aaron-Satellite-P755:~/Documents/tartest$ ls
acerbic.odt acetose.ods backup caustic.odt piquant scathing.txt tangy.ods
aaron@aaron-Satellite-P755:~/Documents/tartest$ tar xpf backup/backup4.tgz \
> -g backup/backup4.snar
aaron@aaron-Satellite-P755:~/Documents/tartest$ ls
acerbic.odt backup biting.odp caustic.odt piquant scathing.txt tangy.ods
aaron@aaron-Satellite-P755:~/Documents/tartest$ tar xpf backup/backup5.tgz \
> -g backup/backup5.snar
aaron@aaron-Satellite-P755:~/Documents/tartest$ ls
acerbic.odt biting.odp piquant scathing.txt
backup caustic.odt pungent.odt tangy.odsThe most common incremental backup paradigm is chain incremental backups, where each incremental backup is the difference from the previous backup. An alternative is baseline incremental backups, where each incremental backup is the difference from the last full backup. The baseline paradigm will take a little longer to back up the further you get from the last full backup, but it has the advantage that restoring your system will only require extraction of the last full backup and the last incremental backup.

Housekeeping
In all examples, I was very careful to exclude directories, not just the contents of the directories. When using listed backups, this is important, because otherwise extraction will remove some or all of the contents of those directories.
I also ran all tar commands from within the top directory to be backed up (tartest). If they were run with different working directories, but the resulting .tar files would look different.

Things to know about backing up an entire system
Of course, creating a backup of your system within your system provides limited security. A HDD crash could wipe out your system and your backup in one fell swoop. System backups should be placed outside the system partition, or better yet, on an entirely different HDD. This will require you to learn the path to the backup location. On the other hand, having the backup directory outside the system will mean you don't need to worry about excluding it.
Before moving on from a test folder to the entire system, there are a few things to know. First, normal users don't by default have permissions to create or modify files outside their home directory, so commands will need to be executed as root. You could change the permissions of your backup directory to allow everyone to write to it. That way, root permissions would only be needed for restoring your system.
In modern Linux distros /proc, /sys, and /dev are all virtual filesystems. This concept is similar to a mounted partition, but in the case of these three directories, the contents (which don't really exist on any hard drive) are merely way of presenting information from the kernel—information that exists on RAM. (If one were to boot from a live CD and examine these directories on the hard drive, they would find them completely empty.) It is not necessary to back up or restore the contents of these directories; moreover, it is not a good idea to add to or modify them. They should therefore be excluded from all archives. (These are the folders I mentioned should not be excluded using exclude-tag-all.)
I have read some recommendations that the /lost+found directory also be excluded. I disagree. The pedant's argument is that it would be chronologically incorrect to exclude it. The pragmatist's argument is that in the unlikely event that the disk check puts a file in /lost+found, it's unlikely it will be modified or removed (or needed, for that matter). Really, at the end of the day, whether /lost+found is excluded will be inconsequential.
One should always be mindful of what partitions and drives are being backed up and restored. Typically, other HDD partitions are mounted to the /mnt directory. Removable media (such as CDs and USB drives) are mounted to the /media directory. You will probably want to exclude both from your backups. Note that while it is possible to include Linux installs on other partitions, including Windows or Mac installs is futile. Their closed-system formats make it impossible to even read many system files. Backup software intended for those operating systems will have to be used.
If you're restoring to the same HDD with all the partitions unchanged, everything should be up and running. Otherwise, you may have to restore your grub (recommended reading: http://ubuntuforums.org/showpost.php?p=117829&postcount=2) and fix your fstab to reflect the new situation.

Suggestions
I will wrap up with a few ideas to consider:


Given that compression is ineffective on most media files, you could back up folders like ~/Music and ~/Videos in separate archives.
If you plan to back up regularly, creating a shell script for backing up would save time and cut down the risk of entering the command wrong.
You could also use cron to schedule the backups.
The first time you create a backup of the entire system, it would be a good idea to check the backup to be sure it contains what it's supposed to. The file will be large, and depending on your system it may take Archive Manager a long time to open. As an alternative, you could run tar in list mode and have it output to a .txt file.