Results 1 to 10 of 10

Thread: image extracting from webpage

  1. #1
    Join Date
    Aug 2010
    Beans
    35
    Distro
    Ubuntu 10.10 Maverick Meerkat

    Question image extracting from webpage

    hi, I recently saved a large number of web pages from a website on my computer and all of them contains images.
    I needed help extracting all the images from the webpages(all of them)
    there are about 17000 saved webpages in the folder and all of them have images. I am not sure how to extract images from webpage in batch. google search dint turn up anything..
    is there lika a tool or something for the job or should a script be written for it??
    any help would be appreciated..
    Thnx in advance..

  2. #2
    Join Date
    May 2010
    Location
    UK
    Beans
    305
    Distro
    Ubuntu 10.10 Maverick Meerkat

    Re: image extracting from webpage

    Are all the webpages in html format or a single web page archive?
    If they are html all the images should be under a seperate directory (i.e. /webpage/images/header.jpg)
    Website - Very serious work in progress, I don't work on this as much as I'd like but I'm mainly looking at the backend right now.

  3. #3
    Join Date
    Aug 2007
    Location
    PA
    Beans
    363
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: image extracting from webpage

    Quote Originally Posted by ironic.demise View Post
    Are all the webpages in html format or a single web page archive?
    If they are html all the images should be under a seperate directory (i.e. /webpage/images/header.jpg)
    if they are in a seperate folder, you can cd to the website dir

    and run "rm -i *.jpg *.png *.bmp" etc, i dont know if this will move through directories, but im afraid to put -r in there as it might delet directories
    Linux.

  4. #4
    Join Date
    Aug 2010
    Beans
    35
    Distro
    Ubuntu 10.10 Maverick Meerkat

    Re: image extracting from webpage

    Quote Originally Posted by ironic.demise View Post
    Are all the webpages in html format or a single web page archive?
    If they are html all the images should be under a seperate directory (i.e. /webpage/images/header.jpg)
    nope, the images are not saved in separate folders..
    otherwise i would have done what pavel989 has told.
    i downloaded all the pages using wget.
    i have attached 3 files for u guys to have a look.
    Thnx for such a quick reply..

    PS : i have renamed the files to .txt format.
    but they should open with a web browser.
    Attached Files Attached Files

  5. #5
    Join Date
    Mar 2006
    Location
    Sweden
    Beans
    220
    Distro
    Ubuntu Development Release

    Re: image extracting from webpage

    Right-click and click "Save pic as". Worked for me!

  6. #6
    Join Date
    May 2010
    Location
    UK
    Beans
    305
    Distro
    Ubuntu 10.10 Maverick Meerkat

    Re: image extracting from webpage

    He wanted a shell script to do the job though...
    can't you
    Code:
    wget url/{*.jpg,*.png,*.bmp,*.gif}
    The same way that some shell scripts only wget .deb files or .zip files?
    Website - Very serious work in progress, I don't work on this as much as I'd like but I'm mainly looking at the backend right now.

  7. #7
    Join Date
    Aug 2007
    Location
    PA
    Beans
    363
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: image extracting from webpage

    Linux.

  8. #8
    Join Date
    Aug 2010
    Beans
    35
    Distro
    Ubuntu 10.10 Maverick Meerkat

    Re: image extracting from webpage

    Quote Originally Posted by polki@mac.com View Post
    Right-click and click "Save pic as". Worked for me!
    lol.. i have around 17000 files.. i seriously am not gonna do dat manually one by one..
    Whats the use of having a comp then?
    but thanx for the try anyway..

    @ironic.demise
    nope, i tried that.. but it did not work.
    here s what i had done to retrieve the webpages..

    for x in {1..99999}; do wget --wait=05 --random-wait http://500px.com/photos/$x; done

    @pavel989
    i checked out the link and tried to follow it.. but got lost after the first few lines..
    i dont have much experience with linux.
    i ll probably give it another shot later..

  9. #9
    Join Date
    Aug 2007
    Location
    PA
    Beans
    363
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: image extracting from webpage

    ultimately, i was recommending to macro a file editor.

    but have a look at something like this instead: http://www.ehow.com/how_7310377_extr...tml-pages.html

    google around for a text extractor from a webpage. i searched "extract text from a webpage"
    Linux.

  10. #10
    Join Date
    Nov 2010
    Beans
    2

    Re: image extracting from webpage

    I suggest a different approach when downloading the site:
    Re-download the website using HTTrack. (A very powerful tool.) This will give you a complete off-line copy, depending on the selected options, including all the images (instead of txt-files created by wget.) After creating the off-line copy, you may then simply collect the jpgs from your local drive.

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •