Re: Is this right? 4.4 GB folder compressed to 898 MB?
Compression ratio depends heavily on how "random" the data to be compressed is. Data that is very random, with a lot of "entropy" makes it hard for compression utilities to find patterns, whereas data that is the opposite is easier to compress. Here's a small script I made to test how well different compression algorithms work on random data and zeros, but it'll probably serve to show how tremendous the variation in compression ratio is:
Code:
#!/bin/bash
prog=gzip
cd /tmp
echo -n "Size of random data before compression with $prog: "
(dd if=/dev/urandom of=randombytes &) &>/dev/null
sleep 1
killall -wq dd
ls -lh randombytes | awk {'print $5'}
echo -n "Size after compression: "
$prog randombytes
ls -lh randombytes.* | awk {'print $5'}
rm randombytes.*
echo -en "\nSize of zeros before compression with $prog: "
(dd if=/dev/zero of=zeros &) &>/dev/null
sleep 1
killall -wq dd
ls -lh zeros | awk {'print $5'}
echo -n "Size after compression: "
$prog zeros
ls -lh zeros.* | awk {'print $5'}
rm zeros.*
exit 0
Just an interesting fact I found out with this script, bzip2 compresses completely repetitive files thousands of times better than gzip does, but bzip2 creates a bigger size to random data then gzip does.
The whole thing is so patently infantile, so foreign to reality, that to anyone with a friendly attitude to humanity it is painful to think that the great majority of mortals will never be able to rise above this view of life.
~Sigmund Freud
Bookmarks