Results 1 to 6 of 6

Thread: How can my bash script download from a website which uses link obfuscation?

  1. #1
    Join Date
    Nov 2008
    Beans
    48

    How can my bash script download from a website which uses link obfuscation?

    There is a free daily podcast available on the site www.schiffradio.com that I enjoy listening to, but the site seems to use some sort of link obfuscation/download security scripting (I think to prevent other websites from linking to his content).

    I am trying to automate this daily download process for personal use since its annoying to have to go to the website everyday and manually click "Save target as..." etc. The RSS feed only features clips and does not include the full broadcast featured on the homepage.

    This is the bash script I wrote to try and automate the process but wget gets hung up on the filename obfuscation. The website name is the first command line argument (i.e. script www.schiffradio.com)

    Code:
    #!/bin/sh
    
    #This script scrapes a webpage and downloads all the mp3 files on it
    
    echo $1
    link=$1
    w3m -dump_source $1 | zcat > tempsource
    grep --ignore-case --only-matching --regexp="http:\/\/.*\.mp3" tempsource > tempoutput
    cat tempoutput | uniq - > duplicatesremoved
    rm tempoutput
    wget --input-file=duplicatesremoved
    Linux:~$ ./getlinks.sh www.schiffradio.com
    www.schiffradio.com
    --2011-11-22 21:28:05-- http://www.schiffradio.com/site/rd;j...g0NDA2Kip8.mp3
    Resolving www.schiffradio.com... 72.26.99.238
    Connecting to www.schiffradio.com|72.26.99.238|:80... connected.
    HTTP request sent, awaiting response... 302 Moved Temporarily
    Location: http://www.schiffradio.com/downloads...g0NDA2Kip8.mp3 [following]
    --2011-11-22 21:28:05-- http://www.schiffradio.com/downloads...g0NDA2Kip8.mp3
    Reusing existing connection to www.schiffradio.com:80.
    HTTP request sent, awaiting response... 302 Moved Temporarily
    Location: http://fetch.noxsolutions.com/schiff...111122_low.mp3 [following]
    --2011-11-22 21:28:06-- http://fetch.noxsolutions.com/schiff...111122_low.mp3
    Resolving fetch.noxsolutions.com... 216.38.170.101
    Connecting to fetch.noxsolutions.com|216.38.170.101|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 29128359 (28M) [audio/x-mpeg]
    rd;jsessionid=44556A76EE1EEE77DB1E1F09C1690225?sat ype=2&said=1&url=http:%2F%2Fwww.schiffradio.com%2F downloadsecurity?url=aHR0cDovL2ZldGNoLm5veHNvbHV0a W9ucy5jb20vc2NoaWZmL2F1ZGlvL3BhXzIwMTExMTIyX2xvdy5 tcDMqKnwxMzIyMDI2MDg0NDA2Kip8.mp3: File name too long
    Any thoughts? I'm not familiar with how this link obfuscation download security scripting works.

  2. #2
    Join Date
    Apr 2008
    Beans
    451
    Distro
    Kubuntu 10.04 Lucid Lynx

    Re: How can my bash script download from a website which uses link obfuscation?

    try a traffic sniffer tool to understand what this site is doing - there is a popular firefox addon called "live http headers" - this will allow you to understand the http requests the stream is using. If it's a different protocol than http (e.g. rtmp,rtsp,mms) you'll be better of with a more powerful sniffer like wireshark. You'll need other tools to dump this then as well.

    Edit: Turns out you're lucky - it's a http stream - you can use another popular extension to download this kind of stream: DownloadHelper - this will also work on youtube - install the addon from the firefox extention manager, klick the stream to start playback, then the DownloadHelper logo will start to cycle once it has found a valid url.
    Last edited by Lampi; November 23rd, 2011 at 04:17 PM.

  3. #3

    Re: How can my bash script download from a website which uses link obfuscation?

    To me, it looks like the www.schiffradio.com/content.mp3 files are really being served from the fetch.noxsolutions.com host.

    the jsessionid (probably can be used along with a an http referrer) is being used as an "authorization" mechanism from the schiffradio.com to get the MP3s from fetch.noxsolutions.com

    The bottom-line is:
    This could be code corrected but may involve a tad bit more than just a passing fancy with bash.
    The MP3s don't appear to be stored on www.schiffradio.com

    HTH explain what may be going on there.
    Windows assumes the user is an idiot.
    Linux demands proof.

  4. #4
    Join Date
    Apr 2008
    Beans
    451
    Distro
    Kubuntu 10.04 Lucid Lynx

    Re: How can my bash script download from a website which uses link obfuscation?

    @habitual: it's http://nox.mp3.lee.miisolutions.net/schiff

    so to dump it try sth like this

    Code:
    wget -O "The_Peter_Schiff_Show_-_`date`.mpeg" http://nox.mp3.lee.miisolutions.net/schiff
    hit ctrl+c to stop it once it's done
    Last edited by Lampi; November 23rd, 2011 at 04:26 PM. Reason: replaced ' with "

  5. #5

    Re: How can my bash script download from a website which uses link obfuscation?

    good eye Lampi!
    Windows assumes the user is an idiot.
    Linux demands proof.

  6. #6
    Join Date
    Apr 2008
    Beans
    451
    Distro
    Kubuntu 10.04 Lucid Lynx

    Re: How can my bash script download from a website which uses link obfuscation?

    maybe there is some better way to download it than wget? this stream might be some sort of endless loop, don't know a better way to cope with that then hit ctrl+c once it's done and restarts the loop ... since it's pure mpeg audio you can listen to it (even seek forward) while it is dumping

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •