View Full Version : [IDEA] Create an easy way to browse wikipedia offline
zsouthboy
August 14th, 2007, 12:32 PM
http://www.softlab.ntua.gr/~ttsiod/buildWikipediaOffline.html
The tools are already made and available.
It would be nice to give users in "Add/Remove Programs" something that (after warning of the large download) downloads the wikipedia dump, and then installs and configures the aforementioned script.
You don't know how many times I've been somewhere with my laptop, without WiFi, that this would've come in handy for me, personally.
Is this idea worth pursuing? Thoughts?
smartboyathome
August 14th, 2007, 11:09 PM
I would say try packaging it and submitting it to the community repo. I may actually use this when studying on the campus of my college where there is not an internet connection (which is a lot of places right now).
Andruk Tatum
August 17th, 2007, 02:34 AM
I believe this was just posted on slashdot.
Here's the link to the original article:
http://www.softlab.ntua.gr/~ttsiod/buildWikipediaOffline.html
hope this helps. it would be really cool to get this easy to setup for regular users. and it would be cool for the olpc project as well.
gnomeuser
August 17th, 2007, 03:16 AM
I'm sure that would please Wikipedia, their bandweight is probably not envisioned as being used for 6 million Ubuntu users basically rsyncing the entire database. For that to work in any kind of moral way Ubuntu would have to do regular syncs and distribute the database diffs on their network so not to abuse wikipedia..
I wonder how big the wikipedia database is, or rather how smaller we can get it for this kind of distribution.
Nekiruhs
August 17th, 2007, 08:01 AM
I wonder how big the wikipedia database is, or rather how smaller we can get it for this kind of distribution.
I don't remember the URL but there was a site where you could download wikipedia's current state in a 2.7 GB .bz2 file
foerdi
August 17th, 2007, 09:27 AM
+1
gnomeuser
August 17th, 2007, 11:19 AM
I don't remember the URL but there was a site where you could download wikipedia's current state in a 2.7 GB .bz2 file
Now imagine pushing even a small diff for that every say.. month or so.. our mirrors are going to love us.
GuidoCalvano
August 18th, 2007, 02:57 PM
And if you took only the english one?
And you dont have to redistribute the entire thing. You'd just have to do updates.
GuidoCalvano
August 18th, 2007, 03:02 PM
Pfff from the looks of it its scary....
Best not...
http://download.wikimedia.org/enwiki/20070716/
http://en.wikipedia.org/wiki/Wikipedia:Database_download
GuidoCalvano
August 18th, 2007, 03:03 PM
Do note though that once its in the rest is just updating.
But still its probably waaaayyy to big.
yammosk
August 23rd, 2007, 05:32 AM
It would be great if this process could be made simpler. As a rookie, I was unable to follow the so called "fast" way to build an offline Wikipedia. As my university does not have full wlan-coverage yet, an offline Wikipedia would be great for my Feisty Fawn (or Gutsy Gibbon later) however.
UbuWu
August 24th, 2007, 04:57 PM
This is much easier: http://moulinwiki.org/
The english version is coming soon...
gnomeuser
August 24th, 2007, 07:51 PM
Do note though that once its in the rest is just updating.
But still its probably waaaayyy to big.
Even a monthly diff for the wikipedia data is going to be frikkin huge, if we are talking 2.7GB for the whole thing and let's assume only 1% changes in a month you are pushing 27 megs which would still make it one of the largest packages in the repo. 1% seems insultingly low by wikipedia standards though, they add and alter records at an impressive rate. Not to mention you'd have to do the initial push, anyone up for hosting a 2.7GB download, even if only a fraction of the Ubuntu userbase downloads it we are still looking at a major problem - I guess we could use bittorrent but that just opens a whole can of worms in terms of roll out, security and such.
I don't think it's likely that we can distribute wikipedia using out regular updates system, it would tax the mirrors and the gain isn't really that great compared to just using the existing service or any number of programs that will monitor articles of interest.
r3m0t
August 25th, 2007, 08:22 PM
Why do people need the latest and greatest of every article?
As of October 2006, 20.5% of enwiki articles are less than half a kilobyte. (Trends suggest about 17% now.) They're usually not very useful - cutting them out could save about 0.2GB.
Then cut out all the meta-pages (the description says there are a few meta-pages), all data except the titles and texts, and also remove interwiki links (that tell you how to find the same article in other languages... no thanks) and you could save a bit of space.
This isn't really Ubuntu's thing though.
UbuWu
August 26th, 2007, 02:47 PM
Here is a listing of all ways to browse wikipedia offline: http://intelligentdesigns.net/blog/?p=73
vBulletin® v3.8.0 Release Candidate 2, Copyright ©2000-2009, Jelsoft Enterprises Ltd.