PDA

View Full Version : Download wikipedia



mmix
May 23rd, 2010, 03:17 AM
Latest complete dump of English Wikipedia :
http://dumps.wikimedia.org/enwiki/20100130/



# 2010-02-03 04:28:08 done Articles, templates, image descriptions, and primary meta-pages.
2010-02-03 04:28:08: enwiki 9541307 pages (462.716/sec), 9541307 revs (462.716/sec), 93.1% prefetched, ETA 2010-02-03 14:22:34 [max 26044759]

* This contains current versions of article content, and is the archive most mirror sites will probably want.
* pages-articles.xml.bz2 5.6 GB

f20fee309f8a24990ebd1b44365756af enwiki-20100130-site_stats.sql.gz
67306b6c4fcb212986dc370839052d19 enwiki-20100130-image.sql.gz
4e08666f590d366513a671095666880c enwiki-20100130-oldimage.sql.gz
85aa9c506312777e76e328c0616a3a47 enwiki-20100130-pagelinks.sql.gz
8b5055115adbb19d96865113cc24c230 enwiki-20100130-categorylinks.sql.gz
25f3220ea3e49c260d6d3879901a3306 enwiki-20100130-imagelinks.sql.gz
8bb4ffafe5a1358034e89bb3127c18fc enwiki-20100130-templatelinks.sql.gz
e32d799e86f8d2be5ebb3eed048238c7 enwiki-20100130-externallinks.sql.gz
e32d799e86f8d2be5ebb3eed048238c7 enwiki-20100130-langlinks.sql.gz
e32d799e86f8d2be5ebb3eed048238c7 enwiki-20100130-interwiki.sql.gz
e32d799e86f8d2be5ebb3eed048238c7 enwiki-20100130-user_groups.sql.gz
e32d799e86f8d2be5ebb3eed048238c7 enwiki-20100130-category.sql.gz
e32d799e86f8d2be5ebb3eed048238c7 enwiki-20100130-page.sql.gz
e32d799e86f8d2be5ebb3eed048238c7 enwiki-20100130-page_restrictions.sql.gz
e32d799e86f8d2be5ebb3eed048238c7 enwiki-20100130-page_props.sql.gz
e32d799e86f8d2be5ebb3eed048238c7 enwiki-20100130-protected_titles.sql.gz
e32d799e86f8d2be5ebb3eed048238c7 enwiki-20100130-redirect.sql.gz
e32d799e86f8d2be5ebb3eed048238c7 enwiki-20100130-all-titles-in-ns0.gz
7f81caa9975c8a4b3bb88e6a1a5cdfbf enwiki-20100130-abstract.xml
788f875bc8f0e46fbdd7743bc30d90be enwiki-20100130-stub-meta-history.xml.gz
c5a7c03c4e40639ef7eeca5aed298cc8 enwiki-20100130-stub-meta-current.xml.gz
45e63b0380c6b411e3a902dc68905ed7 enwiki-20100130-stub-articles.xml.gz
f078255570c07330edf2e2b68ef229cf enwiki-20100130-pages-articles.xml.bz2
340e7233bcfa705bd0518a7d039b4fe2 enwiki-20100130-pages-meta-current.xml.bz2
790e17f26f0a1101221b8fefff670abc enwiki-20100130-pages-logging.xml.gz
65677bc275442c7579857cc26b355ded enwiki-20100130-pages-meta-history.xml.bz2
da705eaf7a7e91bda803b256f3a8bf1b enwiki-20100130-pages-meta-history.xml.7z

baddog144
May 23rd, 2010, 03:22 AM
er. ok. I can't possibly imagine anyone actually wanting this, except maybe to analyze. :/

lisati
May 23rd, 2010, 03:34 AM
Could be worse: I've seen screenshots of people experiencing epic failures trying to download the whole internet.

Kafubie
May 23rd, 2010, 03:34 AM
If I had 1 Petabyte of freespace... Then yes... I would like to download that gargantuan 5.6 GBs

(1PB=1024TB)

Dayofswords
May 23rd, 2010, 03:41 AM
If I had 1 Petabyte of freespace... Then yes... I would like to download that gargantuan 5.6 GBs

(1PB=1024TB)
this is actually just the english wikipedia

the total size un archived is about >8TB of data

if i had the space(and plenty of it), speed, time and computer processing power...
i'd download this as my data pack rat-ness would kick in

coolbrook
May 23rd, 2010, 03:42 AM
Some light reading for globetrotters.

Kafubie
May 23rd, 2010, 03:43 AM
this is actually just the english wikipedia

the total size un archived is about >8TB of data

I know.

I just want a PB of space..
Don't assume I thought that wiki was 200TB or something I AM NOT STUPID!

*cry+sob*=annoying screeching noise.
lol.

LeifAndersen
May 23rd, 2010, 04:15 AM
Actually...I am interested in downloading wikipedia. Although, I would rather do it as a sort of mirror, that way I could update it without having to redownload the whole thing again...also, I'd probably split it across 3 days, so that my University doesn't get to cranky with me downloading 5.6 GB of data all in the space of a few hours.

Legendary_Bibo
May 23rd, 2010, 04:24 AM
Interesting actually, how large is it? Also how recent?

Dayofswords
May 23rd, 2010, 04:29 AM
Actually...I am interested in downloading wikipedia. Although, I would rather do it as a sort of mirror, that way I could update it without having to redownload the whole thing again...also, I'd probably split it across 3 days, so that my University doesn't get to cranky with me downloading 5.6 GB of data all in the space of a few hours.

the public downloadable stuff doesnt include userdata, and other stuff
and the download total seems to be over 300gb

Legendary_Bibo
May 23rd, 2010, 04:29 AM
I could spare it if it's only 5.6 gb, how do I download it?

Dayofswords
May 23rd, 2010, 04:32 AM
I could spare it if it's only 5.6 gb, how do I download it?

here
http://dumps.wikimedia.org/enwiki/20100130/
and downloading each one

but it is over 300GB, perhaps hitting 400gb
that 5.6gb is only one part, the current versions of the articles

Legendary_Bibo
May 23rd, 2010, 04:37 AM
here
http://dumps.wikimedia.org/enwiki/20100130/
and downloading each one

but it is over 300GB, perhaps hitting 400gb
that 5.6gb is only one part, the current versions of the articles

Then what's in the big version? If it's just wikipedia in all the other languages then I don't need it. Well if anyone want to the decompressed version I suggest getting this.

http://www.bhphotovideo.com/c/product/592236-REG/PROAVIO_EB10PM_10_0TB_10TB_editBOX_10PM_Array.html

Legendary_Bibo
May 23rd, 2010, 04:40 AM
I'm downloading the 5.6 gb version considering it's just the articles. I don't need that other stuff. I mean I only have 235gb free.

tgalati4
May 23rd, 2010, 05:20 AM
Crap, it won't fit on a DVD.

Guitar John
May 23rd, 2010, 05:42 AM
I don't need to.
Wife knows everything. 8-[

Legendary_Bibo
May 23rd, 2010, 06:04 AM
I'm at 95%!!

papangul
May 23rd, 2010, 06:15 AM
An application(consists of appx. 5gb data files) is available from the nokia OVI store which makes the whole wikipedia available offline on a mobile phone. I guess it uses such a dump.

Legendary_Bibo
May 23rd, 2010, 06:23 AM
An application(consists of appx. 5gb data files) is available from the nokia OVI store which makes the whole wikipedia available offline on a mobile phone. I guess it uses such a dump.

I saw that same app. It's being sold for $10 or something isn't it? That must have taken a lot of work.

Anyways now I'm decompressing it...it's going to take a while. I had to use chromium because it's more lightweight compared to firefox and I don't want to risk a freeze up.

Dayofswords
May 23rd, 2010, 06:46 AM
I saw that same app. It's being sold for $10 or something isn't it? That must have taken a lot of work.

Anyways now I'm decompressing it...it's going to take a while. I had to use chromium because it's more lightweight compared to firefox and I don't want to risk a freeze up.

a long while lol

as for that thing
it better have a syncing feature :P

Legendary_Bibo
May 23rd, 2010, 06:52 AM
a long while lol

as for that thing
it better have a syncing feature :P

Yep I found that it's taking people about 5 hours to decompress it. Then I have to run it through a wikipedia reader that can archive everything for me.

Legendary_Bibo
May 23rd, 2010, 07:46 AM
Well I decompressed it (24.9 GB!), and now I don't know what to do with it. The archivers I got can't seem to archive the xml file or even the compressed tarball.

Legendary_Bibo
May 23rd, 2010, 08:08 AM
Oh and as a fair warning don't try to open it up through firefox or any web browser. It will make it crash. :P

Henry Flower
May 23rd, 2010, 08:54 AM
Well I decompressed it (24.9 GB!), and now I don't know what to do with it. The archivers I got can't seem to archive the xml file or even the compressed tarball.

I use the wikitaxi reader (Windows, but works in wine). My wikitaxi file is only about 8GB, though I downloaded it around a year ago.

For those wanting to download in stages, there are wikipedia dumps available as torrents.

mmix
May 23rd, 2010, 08:55 AM
here is few methods how to use it.

Ways to process and use Wikipedia dumps (http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/)


EChronicle:Wikipedia database (http://www.pilhokim.com/index.php?title=EChronicle:Importing_Wikipedia)

Building a (fast) Wikipedia offline reader (http://users.softlab.ece.ntua.gr/~ttsiod/buildWikipediaOffline.html)

How to mirror Wikipedia (http://modzer0.cs.uaf.edu/~dev2c/wiki/How_to_mirror_Wikipedia)

CrimsonBizarre
May 23rd, 2010, 09:03 AM
Could be worse: I've seen screenshots of people experiencing epic failures trying to download the whole internet.

http://www.jimbarr.net/idownload/downloadwww.gif

Legendary_Bibo
May 23rd, 2010, 09:54 AM
Oh I just got the xml file whoops. Well this is going to be a project.

ramnarayan
May 23rd, 2010, 12:06 PM
Latest complete dump of English Wikipedia :
http://dumps.wikimedia.org/enwiki/20100130/



For those really interested in offline wiki and backed up with updates

http://www.idasystems.net/wikireader

this is still under development and its open source (hardware and software both ) :-)

ram

lisati
May 23rd, 2010, 12:09 PM
http://www.jimbarr.net/idownload/downloadwww.gif

Thanks. I had a quick look for that graphic but couldn't remember where I'd seen it....

Legendary_Bibo
May 23rd, 2010, 12:14 PM
http://www.jimbarr.net/idownload/downloadwww.gif
This is going to sound like a stupid question...maybe, but was someone really trying to download the internet?

new_tolinux
May 23rd, 2010, 01:05 PM
This is going to sound like a stupid question...maybe, but was someone really trying to download the internet?
There are some......
Internet Archive: Wayback Machine (http://www.archive.org/web/web.php)