Results 1 to 5 of 5

Thread: wget: download specific links of a certain page

  1. #1
    Join Date
    Mar 2013
    Beans
    20

    Question wget: download specific links of a certain page

    Hi all,

    Is it possible to download specific links with wget. Suppose I want to download all files of "www.example.com/page2", which start with "dl.example.com".
    I can use the following code to download specific file types, e.g. jpg, but I am not sure how to specify the "dl.example.com"?

    Code:
    wget -r -p -A.jpg http://www.example.com/page2
    Thanks for your help.

  2. #2
    Join Date
    May 2010
    Location
    Tewkesbury uk
    Beans
    7,617
    Distro
    Ubuntu Development Release

    Re: wget: download specific links of a certain page

    Hi

    Do you have a more concrete example to work with ? (as opposed to www.example.com and dl.example.com)

    You may get a better help as i'm not really sure if i understand what you want.

    The -A option can take regular expressions.

    -A acclist --accept acclist
    -R rejlist --reject rejlist
    Specify comma-separated lists of file name suffixes or patterns to accept or reject. Note that if any of the wildcard
    characters, *, ?, [ or ], appear in an element of acclist or rejlist, it will be treated as a pattern, rather than a
    suffix.
    Kind regards
    Join us on irc at #ubuntuforums. For web chat see here

    If you believe everything you read, you better not read. ~ Japanese Proverb

    Do not read newspapers on an empty stomach ~ Russian Proverb ~ BrunoLotse

  3. #3
    Join Date
    Feb 2013
    Beans
    Hidden!

    Re: wget: download specific links of a certain page

    @matt_symes IIRC, those patterns are shell globs, not regular expressions.

    @OP If you cannot accomplish what you want with wget's -A/-R, have a look at curl. It allows to do some neat things with URLs like
    Code:
    curl 'http://{one,two}.host[1-5].com' -o "#1_#2"
    Last edited by schragge; March 30th, 2013 at 04:03 PM.

  4. #4
    Join Date
    May 2010
    Location
    Tewkesbury uk
    Beans
    7,617
    Distro
    Ubuntu Development Release

    Re: wget: download specific links of a certain page

    Hi

    IIRC, those patterns are shell globs, not regular expressions.
    A fair point. I'll be tighter with my definitions and language.

    Kind regards
    Join us on irc at #ubuntuforums. For web chat see here

    If you believe everything you read, you better not read. ~ Japanese Proverb

    Do not read newspapers on an empty stomach ~ Russian Proverb ~ BrunoLotse

  5. #5
    Join Date
    Nov 2008
    Location
    Kingdom of cookies
    Beans
    Hidden!
    Distro
    Ubuntu Development Release

    Re: wget: download specific links of a certain page

    Quote Originally Posted by Si1414 View Post
    Hi all,

    Is it possible to download specific links with wget. Suppose I want to download all files of "www.example.com/page2", which start with "dl.example.com".
    I can use the following code to download specific file types, e.g. jpg, but I am not sure how to specify the "dl.example.com"?

    Code:
    wget -r -p -A.jpg http://www.example.com/page2
    Thanks for your help.
    some crafty work, but
    Code:
    wget  -qO- http://www.example.com | grep -i "http://dl.example.com" | awk -F"http://" '{print $2}' | awk -F'"' '{print $1}' | wget -i -
    if you want to download links only
    Code:
    wget  -qO- http://www.example.com | grep -i '<a href="http://dl.example.com"' | awk -F"http://" '{print $2}' | awk -F'"' '{print $1}'| wget -i -
    There are many other combinations...

    Downloading images...
    Code:
    wget  -qO- http://www.example.com | grep -i '<img' | awk -F'<img src="' '{print $2}' | awk -F'"' '{print $1}' | grep dl.example.com | wget -i -
    but all of them will need tuning because all sites are different
    Last edited by sandyd; March 30th, 2013 at 05:45 PM.
    Ubuntu Forums Moderation Staff || SandyDNET
    Twitter: @CatchesAStar | Last.fm
    Ubuntu Membership via Forum Contributions

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •