Results 1 to 10 of 10

Thread: Is this right? 4.4 GB folder compressed to 898 MB?

  1. #1
    Join Date
    Apr 2009
    Location
    Costa Rica
    Beans
    255
    Distro
    Ubuntu 10.04 Lucid Lynx

    Question Is this right? 4.4 GB folder compressed to 898 MB?

    I am trying to make a backup of a folder which size is 4.4GB, this folder also contains subfolders that I need to backup, so I just figured I could tar the entire folder like this:

    Code:
    tar cvzf backup.tar.gz foldertobackup
    Which yields a 898 MB file.

    Can the compressing really be that good? I suspect something isn't being backed up here, how can I find out?

    thanks for the help!
    Last edited by X1R1; September 7th, 2012 at 10:49 PM.
    Linux User#498977
    There are only 10 types of people in the world. Those who understand binary, and those who dont.
    My Blog about Linux and other stuff

  2. #2
    Join Date
    Nov 2009
    Location
    Catalunya, Spain
    Beans
    14,558
    Distro
    Ubuntu 18.04 Bionic Beaver

    Re: Is this right? 4.4 GB folder compressed to 898 MB?

    I guess it depends on the format of files compressed. How about untarring it to another location and comparing them?

    Also opening the .tar file with Archive Manager will show you the content. Right-click, open with archive manager.

    PS. When I said right-click I forgot for a moment this is in the server section so you probably don't have a GUI.
    Darko.
    -----------------------------------------------------------------------
    Ubuntu 18.04 LTS 64bit

  3. #3
    Join Date
    Jun 2007
    Beans
    175

    Re: Is this right? 4.4 GB folder compressed to 898 MB?

    The amount of compression you can get depends on a number of things amongst which is the file type you are compressing:
    Jpg and pdf will usuallly hardly compress at all. Things like bitmaps and older word documents will compress considerably. Uncompress your compressed folder and do a file count is a quick and easy way to check.

    Dirdiff will help you compare files and folder for a more thorough check. Data loss through compression and recompression would be most surprising.

  4. #4
    Join Date
    Dec 2007
    Location
    Idaho
    Beans
    4,976
    Distro
    Ubuntu 20.04 Focal Fossa

    Re: Is this right? 4.4 GB folder compressed to 898 MB?

    Do you have symlinks inside that folder? Tar doesn't follow them by default, it just copies them as symlinks. As others said compression will vary widely with what you are compressing.
    "You can't expect to hold supreme executive power just because some watery tart lobbed a sword at you"

    "Don't let your mind wander -- it's too little to be let out alone."

  5. #5
    Join Date
    Apr 2009
    Location
    Costa Rica
    Beans
    255
    Distro
    Ubuntu 10.04 Lucid Lynx

    Exclamation Re: Is this right? 4.4 GB folder compressed to 898 MB?

    Ok I copied the file to a different server via scp, and untarred, then did a:

    Code:
    du -hs uncompressedfolder
    result: 4.4G

    As I was still skeptic I did a du and counted the files:

    Code:
    du -h uncompressedfolder | wc -l
    result: 8321

    Did the same command on the original folder, and...result: 8321

    amazing
    Linux User#498977
    There are only 10 types of people in the world. Those who understand binary, and those who dont.
    My Blog about Linux and other stuff

  6. #6
    Join Date
    Nov 2008
    Location
    Boston MetroWest
    Beans
    16,326

    Re: Is this right? 4.4 GB folder compressed to 898 MB?

    Text files will often achieve 4:1 compression or better. It also depends on how much variability the files have. Many compression algorithms will replace a string of identical characters with a just one placeholder and a counter. If the files have lots of spaces, they are very compressible.

    Most things like graphics and video are already compressed and will show little improvement. The highest levels of compression can be achieved by using the bzip2 algorithm, represented in tar with the "j" switch (don't ask me why it is "j"; maybe they were just running out of letters):

    Code:
    tar cjpvf mydirectory.tar.bz2 mydirectory
    Basically you just use "j" instead of "z" to get bzip2 instead of gzip. By the way, for archiving purposes, you should include the "p" switch to "preserve" all the permissions. See "man tar" for details.
    Last edited by SeijiSensei; September 8th, 2012 at 12:58 AM.
    If you ask for help, do not abandon your request. Please have the courtesy to check for responses and thank the people who helped you.

    Blog · Linode System Administration Guides · Android Apps for Ubuntu Users

  7. #7
    Join Date
    Apr 2009
    Location
    Costa Rica
    Beans
    255
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: Is this right? 4.4 GB folder compressed to 898 MB?

    @SeijiSensei

    thanks for that informative post!

    Indeed they are a lot of text files and there is also a PostgreSQL database in there, but I have no idea what the format for that is.

    I have used bzip2 in the past, but find it that it takes a lot longer to decompress the data (of course, with better compression, longer decompression).

    And thanks for that "p" switch, It looks really useful!

    cheers
    Linux User#498977
    There are only 10 types of people in the world. Those who understand binary, and those who dont.
    My Blog about Linux and other stuff

  8. #8
    Join Date
    Nov 2008
    Location
    Boston MetroWest
    Beans
    16,326

    Re: Is this right? 4.4 GB folder compressed to 898 MB?

    If you are compressing PG backups from pg_dump, then they will be very compressible. They have a lot of spaces, tabs, etc. If you're compressing /var/lib/pgsql, my guess is that it will much less so.
    If you ask for help, do not abandon your request. Please have the courtesy to check for responses and thank the people who helped you.

    Blog · Linode System Administration Guides · Android Apps for Ubuntu Users

  9. #9
    Join Date
    Apr 2006
    Beans
    996
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: Is this right? 4.4 GB folder compressed to 898 MB?

    A possibility is that maybe some binary files were full of 0 bytes. I have reached this kind of ratio when compressing ISO images that were actually full of zeros at the end. Usually files with such repetitions of a single byte like 0 or spaces are very compressible.
    Xye incredibly difficult puzzle game with minimal graphics. Also at playdeb
    Got a blog: Will Stay Free

  10. #10
    Join Date
    Jan 2012
    Beans
    753

    Re: Is this right? 4.4 GB folder compressed to 898 MB?

    Compression ratio depends heavily on how "random" the data to be compressed is. Data that is very random, with a lot of "entropy" makes it hard for compression utilities to find patterns, whereas data that is the opposite is easier to compress. Here's a small script I made to test how well different compression algorithms work on random data and zeros, but it'll probably serve to show how tremendous the variation in compression ratio is:
    Code:
    #!/bin/bash
    
    prog=gzip
    
    cd /tmp
    
    echo -n "Size of random data before compression with $prog: "
    (dd if=/dev/urandom of=randombytes &) &>/dev/null
    sleep 1
    killall -wq dd
    ls -lh randombytes | awk {'print $5'}
    echo -n "Size after compression: "
    $prog randombytes
    ls -lh randombytes.* | awk {'print $5'}
    rm randombytes.*
    
    echo -en "\nSize of zeros before compression with $prog: "
    (dd if=/dev/zero of=zeros &) &>/dev/null
    sleep 1
    killall -wq dd
    ls -lh zeros | awk {'print $5'}
    echo -n "Size after compression: "
    $prog zeros
    ls -lh zeros.* | awk {'print $5'}
    rm zeros.*
    
    exit 0
    Just an interesting fact I found out with this script, bzip2 compresses completely repetitive files thousands of times better than gzip does, but bzip2 creates a bigger size to random data then gzip does.

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •