PDA

View Full Version : Size of Wikipedia



Trzone
September 28th, 2008, 10:10 PM
I know that Wikipedia has these so called "dumps" but does anyone know the actual size of all the information wikipedia has? Just curious.

jespdj
September 28th, 2008, 10:15 PM
Where else would you find the answer to that than on ...Wikipedia ?! ;)

http://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia

schauerlich
September 28th, 2008, 10:15 PM
wget -R http://wikipedia.org



(Just kidding, that will take hours and get you nothing useful. :) )

Trzone
September 28th, 2008, 10:28 PM
theo@Ubuntu:~$ wget -R http://wikipedia.org
wget: missing URL
Usage: wget [OPTION]... [URL]...

Try `wget --help' for more options.



I seriously want to know, and well, wikipedia is only specific in terms of books, not actual computer data! hehe

LaRoza
September 28th, 2008, 10:30 PM
theo@Ubuntu:~$ wget -R http://wikipedia.org
wget: missing URL
Usage: wget [OPTION]... [URL]...

Try `wget --help' for more options.

I seriously want to know, and well, wikipedia is only specific in terms of books, not actual computer data! hehe

The data would be stored in a database, MySQL I think they use.

http://stats.wikimedia.org/EN/Sitemap.htm

schauerlich
September 28th, 2008, 10:37 PM
Don't put the "http://" in there.

The data would be stored in a database, MySQL I think they use.

http://stats.wikimedia.org/EN/Sitemap.htm

Also it's -r not -R

t0p
September 28th, 2008, 10:47 PM
Don't put the "http://" in there.

The data would be stored in a database, MySQL I think they use.

http://stats.wikimedia.org/EN/Sitemap.htm

No, the "http://" does need to be in there. But you should use the "-r" flag not the "-R" flag. Ergo~


wget -r http://wikipedia.org

LaRoza
September 28th, 2008, 10:48 PM
No, the "http://" does need to be in there. But you should use the "-r" flag not the "-R" flag. Ergo~


wget -r http://wikipedia.org

It was a suggestion, but I see now.

I should have RTM'd first.

snova
September 28th, 2008, 10:50 PM
A static HTML dump is about 14.3 GB, compressed with 7zip. Does that answer your question?

Trzone
September 28th, 2008, 11:16 PM
I think i want to rephrase my question, it was way too vague. What is the size of the combined databases that wikipedia owns?:)

init1
September 29th, 2008, 01:05 AM
wget -R http://wikipedia.org



(Just kidding, that will take hours and get you nothing useful. :) )
Heh yeah I tried that once :D

Trzone
September 29th, 2008, 11:14 AM
That command doesn't seem to be working :P but um, i think that the statistics are behind by at least 2 years due to the absolute maassive scale that is wikipedia

snova
September 30th, 2008, 01:34 AM
I think i want to rephrase my question, it was way too vague. What is the size of the combined databases that wikipedia owns?:)

You can download that too, as a dump of the database tables. You can always find out by trying to start the download; but then, that's textual SQL and possibly not a good indication of the size of the binary MySQL tables.