PDA

View Full Version : [SOLVED] how to steal the content of the web site?



hechizera
July 13th, 2008, 04:26 PM
hi guys!

Is there any way to, so to say, "steal"( :) ) the content of the web site? mainly I need to "steal" text and pictures.. saving every page on my comp isn't a good thing, because it takes a lot of time.. I'd like to do that somehow automatically, if that's possible..

So.. please, please somebody help me..

bapoumba
July 13th, 2008, 04:31 PM
hi guys!

Is there any way to, so to say, "steal"( :) ) the content of the web site? mainly I need to "steal" text and pictures.. saving every page on my comp isn't a good thing, because it takes a lot of time.. I'd like to do that somehow automatically, if that's possible..

So.. please, please somebody help me..

What website, any copyrights, do they allow it :confused:
These tools exit and could drain a server..

hechizera
July 13th, 2008, 04:35 PM
What website, any copyrights, do they allow it :confused:
These tools exit and could drain a server..

it is an online cookbook, which has very good recipes.. I want it to be on my computer, because it would be very bad if one day this web site would crash or something..
I am not sure if they allow it, probably not, but who knows.. could you please tell me what are these tools like? I mean names and so..

ssam
July 13th, 2008, 04:38 PM
wget can recursively download stuff.

Masoris
July 13th, 2008, 04:39 PM
What do you mean? Do mean you want to download all component from a web site?

Then, HTTrack could do that.
http://www.httrack.com/

bapoumba
July 13th, 2008, 04:40 PM
it is an online cookbook, which has very good recipes.. I want it to be on my computer, because it would be very bad if one day this web site would crash or something..
I am not sure if they allow it, probably not, but who knows..
If they do not allow it, then you should not do it. May be this recipe site has some user space where you could ask, edit books or something ?

hechizera
July 13th, 2008, 04:42 PM
ssam, thank you I will try it! :)

Masoris, I don't really want everything from the site, just the recipes with pictures :)

hechizera
July 13th, 2008, 04:45 PM
bapoumba, I will read more about their site and also will look at the wget.. if it is not too illegal thing than I will use it..

they have links to the original books, but unfortunately they are in the languages which I am not too good in plus there are no pictures.. so I want exactly that site :)

roaldz
July 13th, 2008, 04:47 PM
It probably is allowed if you just download the website and use it to browse it offline. If you publish the contents anywhere else, you´re (probably) doing illegal stuff.


wget -r www.specificwebsite.com

this downloads all links recursively

maybe there are better tools to do this, but I know this works:)

laxmanb
July 13th, 2008, 04:51 PM
I remember Internet Explorer allowing you to save pages for offline viewing, and it could also get all pages that were a certain number of levels down from the current page. So you could get an entire website offline if you wanted

Does IE still have that? Does Firefox have something equivalent as an extension?

PS: Does mentioning IE get you in trouble here?

Samhain13
July 13th, 2008, 04:54 PM
I agree with Roaldz. Every page you visit gets cached by your browser anyway, so you have a copy of visited web pages and their content in your computer whether the owner allows copying or not. But to republish that content without permission is another story.

bapoumba
July 13th, 2008, 04:55 PM
PS: Does mentioning IE get you in trouble here?
No, it should not :D

hechizera, thanks to respect their copyrights, and as roaldz mentioned, not to re-publish anywhere :)

Polygon
July 13th, 2008, 05:04 PM
if your gonna use wget, make SURE you use a delay argument (check the man page, i think its -w )

but be sure to do this, cause if you dont put a delay, the server might actually ban you because it will think you are trying to ddos it from all the rapid download requests from you coming from wget

so do like wget -mirror -np -w 5 url or something

Mr. Picklesworth
July 13th, 2008, 05:11 PM
No need to be concerned about this stuff. Archive.org has a nice backup of everything.

hechizera
July 13th, 2008, 05:25 PM
roaldz, I need it for personal use only, so probably it is not illegal :)

laxmanb, it's ok about IE.. it has this saving thing, also it is possible to save a web page in firefox, but the method I know is very slow.. maybe there's some other, which I have no clue about..

so.. thanks to everyone for your answers and for help! I finally have the whole 22mb of recipes on my computer! thank you, guys! :)

hechizera
July 13th, 2008, 05:27 PM
Polygon, oops.. I didn't use this argument.. but they didn't ban me anyway, so probably no worries :) anyway thanks for the answer, next time I will do as you wrote!

Mr. Picklesworth, archive. org? I have never heard about it :)

hechizera
July 13th, 2008, 05:30 PM
I marked it as solved.. thanks once again and bye bye :)

intense.ego
July 13th, 2008, 07:03 PM
Out of curiosity, would the wget method work for the follwing example:

the url is www.example.com

but some of what you want is on content.example.com

would using wget on www.example.com get everything, including what is one content.example.com?

mr.propre
July 13th, 2008, 07:13 PM
Out of curiosity, would the wget method work for the follwing example:

the url is www.example.com

but some of what you want is on content.example.com

would using wget on www.example.com get everything, including what is one content.example.com?

No, www and content are both sub directories and have both separate directors where www mostly is the default folder. It' doesn't really have to.

gnuvistawouldbecool
July 13th, 2008, 08:26 PM
edit:because I should remember to read the rest of a thread.

Masoris
July 14th, 2008, 06:35 AM
You also can use a Firefox extension Scrapbook.
http://amb.vis.ne.jp/mozilla/scrapbook/

DigitalDingo
July 14th, 2008, 09:13 AM
You also can use a Firefox extension Scrapbook.
http://amb.vis.ne.jp/mozilla/scrapbook/
Or Google Notebook: http://www.google.com/notebook/. You'll then be able to access all your notes from any computer with internet access.

webcabbie
December 1st, 2010, 08:37 AM
I am looking to do something similar.. There is a forum I read that uses Vbulletin but the site goes down alot and I hate reading scrowling on a screen.

I clicked file save grabbed the relevant html doc and opened it with gedit.

Now how do I remove all the html text? Is there a search and replace function that I can use or are there gedit plugins?