PDA

View Full Version : my hot programming tip - use gzip when downloading pages



NinjaWork
January 25th, 2009, 12:03 PM
Geez man...I've been doing data mining for years. I once downloaded 30,000 pages in an afternoon.

AND, after all that time, I finally tried making a request using gzip. File size dropped from 19k to 4.5k and script speeded up from 2 secs to almost zero...I was amazed :p

slavik
January 25th, 2009, 01:59 PM
could you give a bit more details on the command you used?

NinjaWork
January 25th, 2009, 02:37 PM
could you give a bit more details on the command you used?

the key is adding to the http request:



Accept-Encoding: gzip


and, if you are using Perl:


use Compress::Zlib;
$uncompressed = Compress::Zlib::memGunzip($data)


not all sites use gzip...for example the nytimes.com did, feedburner didn't