Page 1 of 3 123 LastLast
Results 1 to 10 of 27

Thread: downloading a webpage...

  1. #1
    Join Date
    Jul 2008
    Beans
    1,706

    downloading a webpage...

    how would i download a webpage in C++ or C (doesnt matter which)...ill also take answers for perl since im learning it (kinda) and the answer in perl would probably be faster than C/C++

  2. #2
    Join Date
    Oct 2005
    Location
    Davao, Philippines
    Beans
    4,830

    Re: downloading a webpage...

    in perl
    Code:
    use LWP::Simple;
    my $content = get('http://www.ubuntuforums.org');

  3. #3
    Join Date
    Jul 2008
    Beans
    1,706

    Re: downloading a webpage...

    thanks...ive googled around but found nothing for C/C++...perl definitly looks like its the easiest...basically im trying to make my own web crawler...

  4. #4
    Join Date
    Feb 2008
    Location
    52°38'41.6"N/1°19'43.6"E
    Beans
    Hidden!

    Re: downloading a webpage...

    Just had a look at what the conky rss C code does, it uses curl libraries. To actually crawl the content is another matter, but Curl is probably a good choice for the comms as it should simplify things a little.

    I did find some C examples with curl here: http://curl.haxx.se/libcurl/c/example.html

    Further searches on curl and C should give you more info

    Hope that helps

  5. #5
    Join Date
    Jul 2008
    Beans
    1,706

    Re: downloading a webpage...

    well my algorithm (feel free to comment) would be:

    download webpage
    parse string for links
    follow randomly selected link
    repeat until out of links
    backtrack rinse and repeat

  6. #6
    Join Date
    Apr 2007
    Beans
    14,781

    Re: downloading a webpage...

    Quote Originally Posted by jimi_hendrix View Post
    well my algorithm (feel free to comment) would be:

    download webpage
    parse string for links
    follow randomly selected link
    repeat until out of links
    backtrack rinse and repeat
    Well, I'd use more structured approaches.

    I'd use a library for downloading it, a library for parsing html or xhtml, and follow each link.

    String parsing and re's on structred markup is silly.

  7. #7
    Join Date
    Jul 2008
    Beans
    1,706

    Re: downloading a webpage...

    is all that in cURL?

  8. #8
    Join Date
    May 2008
    Beans
    Hidden!

    Re: downloading a webpage...

    Quote Originally Posted by LaRoza View Post
    String parsing and re's on structred markup is silly.
    Not necessarily. If you only went by the HTML, you'd miss URLs in places otherwise.

    On the other hand, a regular expression to find URLs doesn't sound particularly easy...

    is all that in cURL?
    Curl is only an HTTP library, as far as I know.

  9. #9
    Join Date
    Jul 2008
    Beans
    1,706

    Re: downloading a webpage...

    ok ill just use perl then...

  10. #10
    Join Date
    Jul 2008
    Beans
    1,706

    Re: downloading a webpage...

    were would the best place to start crawling be?

Page 1 of 3 123 LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •