kittkatt
November 23rd, 2011, 07:03 AM
There is a free daily podcast available on the site www.schiffradio.com that I enjoy listening to, but the site seems to use some sort of link obfuscation/download security scripting (I think to prevent other websites from linking to his content).
I am trying to automate this daily download process for personal use since its annoying to have to go to the website everyday and manually click "Save target as..." etc. The RSS feed only features clips and does not include the full broadcast featured on the homepage.
This is the bash script I wrote to try and automate the process but wget gets hung up on the filename obfuscation. The website name is the first command line argument (i.e. script www.schiffradio.com)
#!/bin/sh
#This script scrapes a webpage and downloads all the mp3 files on it
echo $1
link=$1
w3m -dump_source $1 | zcat > tempsource
grep --ignore-case --only-matching --regexp="http:\/\/.*\.mp3" tempsource > tempoutput
cat tempoutput | uniq - > duplicatesremoved
rm tempoutput
wget --input-file=duplicatesremoved
Linux:~$ ./getlinks.sh www.schiffradio.com
www.schiffradio.com
--2011-11-22 21:28:05-- http://www.schiffradio.com/site/rd;jsessionid=44556A76EE1EEE77DB1E1F09C1690225?sat ype=2&said=1&url=http%3A%2F%2Fwww.schiffradio.com%2Fdownloadsec urity%3Furl%3DaHR0cDovL2ZldGNoLm5veHNvbHV0aW9ucy5j b20vc2NoaWZmL2F1ZGlvL3BhXzIwMTExMTIyX2xvdy5tcDMqKn wxMzIyMDI2MDg0NDA2Kip8.mp3
Resolving www.schiffradio.com... 72.26.99.238
Connecting to www.schiffradio.com|72.26.99.238|:80... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: http://www.schiffradio.com/downloadsecurity?url=aHR0cDovL2ZldGNoLm5veHNvbHV0a W9ucy5jb20vc2NoaWZmL2F1ZGlvL3BhXzIwMTExMTIyX2xvdy5 tcDMqKnwxMzIyMDI2MDg0NDA2Kip8.mp3 [following]
--2011-11-22 21:28:05-- http://www.schiffradio.com/downloadsecurity?url=aHR0cDovL2ZldGNoLm5veHNvbHV0a W9ucy5jb20vc2NoaWZmL2F1ZGlvL3BhXzIwMTExMTIyX2xvdy5 tcDMqKnwxMzIyMDI2MDg0NDA2Kip8.mp3
Reusing existing connection to www.schiffradio.com:80.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: http://fetch.noxsolutions.com/schiff/audio/pa_20111122_low.mp3 [following]
--2011-11-22 21:28:06-- http://fetch.noxsolutions.com/schiff/audio/pa_20111122_low.mp3
Resolving fetch.noxsolutions.com... 216.38.170.101
Connecting to fetch.noxsolutions.com|216.38.170.101|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 29128359 (28M) [audio/x-mpeg]
rd;jsessionid=44556A76EE1EEE77DB1E1F09C1690225?sat ype=2&said=1&url=http:%2F%2Fwww.schiffradio.com%2Fdownloadsecur ity?url=aHR0cDovL2ZldGNoLm5veHNvbHV0aW9ucy5jb20vc2 NoaWZmL2F1ZGlvL3BhXzIwMTExMTIyX2xvdy5tcDMqKnwxMzIy MDI2MDg0NDA2Kip8.mp3: File name too long
Any thoughts? I'm not familiar with how this link obfuscation download security scripting works.
I am trying to automate this daily download process for personal use since its annoying to have to go to the website everyday and manually click "Save target as..." etc. The RSS feed only features clips and does not include the full broadcast featured on the homepage.
This is the bash script I wrote to try and automate the process but wget gets hung up on the filename obfuscation. The website name is the first command line argument (i.e. script www.schiffradio.com)
#!/bin/sh
#This script scrapes a webpage and downloads all the mp3 files on it
echo $1
link=$1
w3m -dump_source $1 | zcat > tempsource
grep --ignore-case --only-matching --regexp="http:\/\/.*\.mp3" tempsource > tempoutput
cat tempoutput | uniq - > duplicatesremoved
rm tempoutput
wget --input-file=duplicatesremoved
Linux:~$ ./getlinks.sh www.schiffradio.com
www.schiffradio.com
--2011-11-22 21:28:05-- http://www.schiffradio.com/site/rd;jsessionid=44556A76EE1EEE77DB1E1F09C1690225?sat ype=2&said=1&url=http%3A%2F%2Fwww.schiffradio.com%2Fdownloadsec urity%3Furl%3DaHR0cDovL2ZldGNoLm5veHNvbHV0aW9ucy5j b20vc2NoaWZmL2F1ZGlvL3BhXzIwMTExMTIyX2xvdy5tcDMqKn wxMzIyMDI2MDg0NDA2Kip8.mp3
Resolving www.schiffradio.com... 72.26.99.238
Connecting to www.schiffradio.com|72.26.99.238|:80... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: http://www.schiffradio.com/downloadsecurity?url=aHR0cDovL2ZldGNoLm5veHNvbHV0a W9ucy5jb20vc2NoaWZmL2F1ZGlvL3BhXzIwMTExMTIyX2xvdy5 tcDMqKnwxMzIyMDI2MDg0NDA2Kip8.mp3 [following]
--2011-11-22 21:28:05-- http://www.schiffradio.com/downloadsecurity?url=aHR0cDovL2ZldGNoLm5veHNvbHV0a W9ucy5jb20vc2NoaWZmL2F1ZGlvL3BhXzIwMTExMTIyX2xvdy5 tcDMqKnwxMzIyMDI2MDg0NDA2Kip8.mp3
Reusing existing connection to www.schiffradio.com:80.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: http://fetch.noxsolutions.com/schiff/audio/pa_20111122_low.mp3 [following]
--2011-11-22 21:28:06-- http://fetch.noxsolutions.com/schiff/audio/pa_20111122_low.mp3
Resolving fetch.noxsolutions.com... 216.38.170.101
Connecting to fetch.noxsolutions.com|216.38.170.101|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 29128359 (28M) [audio/x-mpeg]
rd;jsessionid=44556A76EE1EEE77DB1E1F09C1690225?sat ype=2&said=1&url=http:%2F%2Fwww.schiffradio.com%2Fdownloadsecur ity?url=aHR0cDovL2ZldGNoLm5veHNvbHV0aW9ucy5jb20vc2 NoaWZmL2F1ZGlvL3BhXzIwMTExMTIyX2xvdy5tcDMqKnwxMzIy MDI2MDg0NDA2Kip8.mp3: File name too long
Any thoughts? I'm not familiar with how this link obfuscation download security scripting works.