PDA

View Full Version : C++ and Internet



Redscare
June 7th, 2007, 04:10 AM
I am trying to write a program that would search multiple search engines at once in a certain way. Can anyone tell me how to access a website/download a webpage (for example, download the source of http://www.ubuntuforums.org) using purely C++? Thank you.

treak007
June 7th, 2007, 04:42 AM
There are no means to allow for internet access from the standard C++ libraries. One thought would be to use the system function in stdlib.h to use something like wget perhaps. Something like



#include<stdlib.h>

int main(void)
{
system("wget www.google.com");
}



that would download the index.html file from google. You could then parse this file to access the data within.

Hope that helps.

PandaGoat
June 7th, 2007, 04:54 AM
It is quite complicated so I will only tell your the theory and not show code. HTTP is a protocol, hypertext transfer protocol. You use sockets to connect to servers, ip addresses, and "talk" to them. You send a message, it sends one back. The messages that you send are in a specific format based on what protocol the server uses. In this case, HTTP.

1. Connect to some web site's ip
2. Request a byte transfer of w/e .html file [or w/e]
3. Receive the data the server sends to your socket

In HTTP's case you would use a TCP client to connect to a server on usually port 80.

Wikipedia has good explanation of sending requests to a server (http://en.wikipedia.org/wiki/HTTP)

Good tutorial for sockets (http://beej.us/guide/bgnet/output/html/multipage/index.html)
I wrote some code for a socket client a while back:

client.h (http://pandagoat.googlepages.com/client.h)
client.cpp (http://pandagoat.googlepages.com/client.cpp)

neo.patrix
June 7th, 2007, 03:36 PM
Hi

What exactly you want to do? Create a search portal....

As far as I know, C++ is not very good choice of language for playing around internet.

But I would suggest using Webservices (WSDL)s, normally there are many libraries in Java
to handle webservices, I am not sure about C++, but there should be because , ulimately java works on JNI for performace reasons.

Google provides webservices and also APIs to use that services, I thin k you can download all that and lot more from http://labs.google.com/?tab=wz

rgds,
Patrix.

Redscare
June 7th, 2007, 09:07 PM
Thank you for the quick replies, and I will do the reading, but I have a very specific humble request now after doing some reading; could someone please post source code that downloads the source of

http://www.google.com/search?hl=en&q=redscare&btnG=Google+Search
and prints it to the command line?
Thanks again.

WW
June 7th, 2007, 09:30 PM
What do you mean when you say "the source of..."?

EDIT: Nevermind, I'm slow... :)

pmasiar
June 7th, 2007, 09:42 PM
check ie this Python examples:
http://www.voidspace.org.uk/python/articles/urllib2.shtml
http://docs.python.org/lib/module-urllib.html

winch
June 7th, 2007, 09:43 PM
Check out libcurl
http://curl.haxx.se/libcurl/

Simple C example.


#include <stdio.h>
#include <curl/curl.h>

int main()
{
CURL *handle;
handle = curl_easy_init();
if (handle == NULL)
{
printf("curl init failed!");
return 1;
}
curl_easy_setopt(handle, CURLOPT_FOLLOWLOCATION, 1);
curl_easy_setopt(handle, CURLOPT_URL, "http://www.google.com/search?hl=en&q=redscare&btnG=Google+Search");
curl_easy_perform(handle);
curl_easy_cleanup(handle);
return 0;
}


Save as main.c then compile and run with with


gcc -Wall `curl-config --cflags --libs` main.c -o curl_test
./curl_test

jfinkels
June 7th, 2007, 09:43 PM
Do it in python, it's simple.

rich.bradshaw
June 7th, 2007, 09:45 PM
or just in a terminal

curl www.somesitehere.com

Redscare
June 7th, 2007, 10:13 PM
So is there no way to do this purely with the files provided in advance with the compiler (C++)? Also, how could I obtain the text off of a webpage? (for example, the results of http://www.google.com/search?hl=en&q=redscare&btnG=Google+Search) Thanks again for the quick responses.

WW
June 7th, 2007, 11:16 PM
Follow the link provided by winch, and read the libcurl documentation.

Here is a C++ variation of the program that winch provided. This version saves the entire web page in a string:

ctest.cpp


#include <iostream>
#include <string>
#include <curl/curl.h>

using namespace std;

string html;

size_t write_data_callback(char *ptr, size_t size, size_t nmemb, void *stream)
{
if (size*nmemb > 0)
{
char tmp = *(ptr+size*nmemb-1);
*(ptr+size*nmemb) = '\0';
html.append(ptr);
html.append(1,tmp);
}
return size*nmemb;
}

int main()
{
CURL *handle;
handle = curl_easy_init();
if (handle == NULL)
{
printf("curl init failed!");
return 1;
}
curl_easy_setopt(handle, CURLOPT_FOLLOWLOCATION, 1);
curl_easy_setopt(handle, CURLOPT_WRITEFUNCTION, write_data_callback);
curl_easy_setopt(handle, CURLOPT_URL, "http://www.google.com/search?hl=en&q=redscare&btnG=Google+Search");
html = "";
curl_easy_perform(handle);
curl_easy_cleanup(handle);
cout << "The first 25 characters: " << html.substr(0,25) << endl;
cout << "The 25 characters starting from the first occurrence of \"<body\": " << html.substr(html.find("<body",0),25) << endl;
return 0;
}


Compile and run:


$ g++ -Wall `curl-config --cflags --libs` ctest.cpp -o ctest
$ ./ctest
The first 25 characters: <html><head><meta http-eq
The 25 characters starting from the first occurrence of "<body": <body bgcolor=#ffffff onl
$

WW
June 7th, 2007, 11:52 PM
So is there no way to do this purely with the files provided in advance with the compiler (C++)?
If you mean without some sort of external library, no. For a language in which libraries like this are "built-in", you want python (which adopts the "batteries included" philosophy).

kknd
June 8th, 2007, 03:07 AM
Please, Java or Python would be much better in this case! Both where made with internet in mind (at least Java =) )