Results 1 to 10 of 10

Thread: Duplicated files in find | tar backup

  1. #1
    Join Date
    Apr 2005
    Beans
    50

    Duplicated files in find | tar backup

    I wish to back up a file structure which has spaces in paths and filenames, retaining the structure.

    I am testing with this structure:

    home/kevin/Test/Source Directory/
    home/kevin/Test/Source Directory/Directory One/
    home/kevin/Test/Source Directory/Directory One/Document Two.odt
    home/kevin/Test/Source Directory/Directory One/Document One.odt

    This is my script, which resulted from much googling:

    Code:
    #!/bin/bash
    find '/home/kevin/Test/Source Directory' -print0 | xargs -0 tar -czpf /home/kevin/Test/Backups.tgz --null
    read -p "Press ENTER to quit"
    Tar outputs these messages:

    tar: Removing leading `/' from member names
    tar: Removing leading `/' from hard link targets

    When I explore Backups.tgz using Archive Manager the directory structure is correct, but with each file there are a further two copies with the same name, but size zero bytes.

    When I run my restore script:

    Code:
    #!/bin/bash/
    tar zxvf /home/kevin/Test/Backups.tgz -C /home/kevin/Test/Restored
    Tar displays:

    home/kevin/Test/Source Directory/
    home/kevin/Test/Source Directory/Directory One/
    home/kevin/Test/Source Directory/Directory One/Document Two.odt
    home/kevin/Test/Source Directory/Directory One/Document One.odt
    home/kevin/Test/Source Directory/Directory One/
    home/kevin/Test/Source Directory/Directory One/Document Two.odt
    home/kevin/Test/Source Directory/Directory One/Document One.odt
    home/kevin/Test/Source Directory/Directory One/Document Two.odt
    home/kevin/Test/Source Directory/Directory One/Document One.odt

    ... which shows the repeated files, but the restored directory structure is correct, having only one of each file.

    Although the end result is as it should be I would rather the internal structure of the backup file was correct.

    I am sure there must be an error in my backup script.

    Thanks for any advice.

    Kevin

  2. #2
    Join Date
    May 2007
    Location
    Leeds, UK
    Beans
    1,675
    Distro
    Ubuntu

    Re: Duplicated files in find | tar backup

    What happens if you use "-type f" as an option to find?
    Please create new threads for new questions.
    Please wrap code in code tags using the '#' button or enter it in your post like this: [code]...[/code].

  3. #3
    Join Date
    Mar 2010
    Location
    Squidbilly-Land
    Beans
    Hidden!
    Distro
    Ubuntu

    Re: Duplicated files in find | tar backup

    There are lots of ways to backup. Tar is old-school ... which is fine, if it meets your needs.

    Doesn't this work?
    Code:
    $ tar cvzf  /home/kevin/Test/Backups.tgz  "/home/kevin/Test/Source Directory"
    Tar removes the leading / - it is a safety thing. If you run this from your ~/Test/ directory, then using
    Code:
    $ tar cvzf  Backups.tgz  "Source Directory"
    should work with relative paths.

    Is there a reason you don't want to use DejaDup or Back-In-Time or rdiff-backup or rsnapshot or rbackup instead? These all will create more efficient backups for storage thanks to librsync and hardlinks. Further, running daily will create daily snapshots with just the changed data, while letting you restore from a backup from today, yesterday, last week or last month ... whatever your retention needs might be. Just offering alternatives - tar is fine.

  4. #4
    Join Date
    Apr 2005
    Beans
    50

    Re: Duplicated files in find | tar backup

    r-senior:

    '-type f' fixed the problem - I'm curious as to why it is necessary. My script is now:
    Code:
    find '/home/kevin/Test/Source Directory' -type f -print0 | xargs -0 tar -czpf /home/kevin/Test/Backups.tgz --null
    The only message from tar is now
    tar: Removing leading `/' from member names.

    Backups.tgz now appears as it should in Archive Manager and the paths displayed by tar when restoring are as expected.

    TheFu:
    Code:
    tar cvzf  /home/kevin/Test/Backups.tgz  "/home/kevin/Test/Source Directory"
    also works perfectly! Everything I have read says that tar cannot cope with spaces in paths, so I am thoroughly confused.

    I may investigate some of the more sophisticated backup alternatives, but just wanted to get a basic backup system set up.


    Many thanks to both of you for your help.

    Kevin

  5. #5
    Join Date
    Mar 2010
    Location
    Squidbilly-Land
    Beans
    Hidden!
    Distro
    Ubuntu

    Re: Duplicated files in find | tar backup

    Quote Originally Posted by kevin1 View Post
    I may investigate some of the more sophisticated backup alternatives, but just wanted to get a basic backup system set up.
    If you want bone-head simple, end-user backups, use Back-in-Time http://backintime.le-web.org/ . It is in the repos, has a GUI (both Gnome and KDE) and it works. The target HDD will need to be Linux formatted, since hardlinks are used. Most importantly, it is 100% automatic and keeps more backup sets closer to "now". Over time, fewer and fewer backups are retained - 1 per prior years, 1 per month in the current year, ... you get the idea.

    http://blog.jdpfu.com/2011/11/12/bes...mputer-backups lists the attributes for backup best practices:

    1. Stable / Works Every Time
    2. Automatic
    3. Different Storage Media
    4. Fast
    5. Efficient
    6. Secure
    7. Versioned
    8. Offsite / Remote
    9. Restore Tested

    Getting all of those usually removes some usability traits. I give up encrypted remote storage for usability, but besides that, rdiff-backup handles everything else nicely for system-level backups and is more efficient than other methods I've tried. 10-20% more storage than a mirror needs for 30-60 days of backups. THAT is impressive.

  6. #6
    Join Date
    Apr 2005
    Beans
    50

    Re: Duplicated files in find | tar backup

    Thanks for the advice TheFu.

  7. #7
    Join Date
    May 2007
    Location
    Leeds, UK
    Beans
    1,675
    Distro
    Ubuntu

    Re: Duplicated files in find | tar backup

    Quote Originally Posted by kevin1 View Post
    '-type f' fixed the problem - I'm curious as to why it is necessary.
    I'm not totally clear on this but I think it is because find is picking up hard links, presumably from '.' and '..', as you traverse the directory structure. That theory is consistent with the messages you were getting. Restricting the find to regular files fixes the issue.

    Everything I have read says that tar cannot cope with spaces in paths, so I am thoroughly confused.
    The GNU utilities often have improvements and additional options over their traditional Unix counterparts. Although they aim to implement existing standards, e.g. POSIX, they are rewrites inspired by, rather than derivatives of. There is no definitive tar program, just different implementations of a tape archiver called tar.
    Please create new threads for new questions.
    Please wrap code in code tags using the '#' button or enter it in your post like this: [code]...[/code].

  8. #8
    Join Date
    Apr 2012
    Beans
    7,256

    Re: Duplicated files in find | tar backup

    Isn't it just because tar is the original serial (tape) archiver - it never 'winds the tape back' to see if a file is already in the archive

    Code:
    $ tar cvf tests.tar tests/one tests/one tests/one
    tests/one
    tests/one
    tests/one
    $ 
    $ tar tf tests.tar 
    tests/one
    tests/one
    tests/one
    so if you tell it to archive a directory (implicitly including its contents) and then explicitly tell it to archive each of the files in the directory it does exactly that

    Code:
    $ tar cvf tests.tar tests tests/one tests/two tests/three
    tests/
    tests/two
    tests/three
    tests/one
    tests/one
    tests/two
    tests/three
    $ 
    $ tar tf tests.tar
    tests/
    tests/two
    tests/three
    tests/one
    tests/one
    tests/two
    tests/three
    By adding the -type f to find, you are no longer passing + implicitly descending the parent directories as well as the individual files

    Or am I missing something?

  9. #9
    Join Date
    May 2007
    Location
    Leeds, UK
    Beans
    1,675
    Distro
    Ubuntu

    Re: Duplicated files in find | tar backup

    I like the theory but how would it explain that the "copies" were zero size?

  10. #10
    Join Date
    Apr 2012
    Beans
    7,256

    Re: Duplicated files in find | tar backup

    Sorry, you're right - I totally missed that part of the discussion in fact in my simple test above, tar tvf indicates the duplicates are added as hardlinks

    Code:
    $ tar tvf tests.tar
    drwxr-xr-x steeldriver/steeldriver   0 2013-08-11 17:27 tests/
    -rw-rw-r-- steeldriver/steeldriver 245 2013-08-11 17:27 tests/two
    -rw-rw-r-- steeldriver/steeldriver  26 2013-08-11 17:27 tests/three
    -rw-rw-r-- steeldriver/steeldriver 256 2013-08-11 17:27 tests/one
    hrw-rw-r-- steeldriver/steeldriver   0 2013-08-11 17:27 tests/one link to tests/one
    hrw-rw-r-- steeldriver/steeldriver   0 2013-08-11 17:27 tests/two link to tests/two
    hrw-rw-r-- steeldriver/steeldriver   0 2013-08-11 17:27 tests/three link to tests/three

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •