[Smt-talk] John McKay's views of Wikipedia

Steve Haflich smh at franz.com
Thu Jul 21 21:02:47 PDT 2011


Michael Gogins <michael.gogins at gmail.com> wrote:

   This thing about the ephemerality of the World Wide Web may or may not
   be the case. The fact is that all the servers where the WWW content
   actually resides, including esp. such as the WIkipedia, are constantly
   being backed up, i.e. archived. The archives are then stored somewhere
   where somebody who needs to can access the archived content. But this
   is all private, all undocumented, and after some time the tapes or
   whatever are reused or go bad or are thrown out.
   ...
   I think it will be a real sign of maturity for our civilization if
   everything, literally everything, that ever appears on the public WWW
   is permanently and redundantly archived in public, accessible,
   searchable form. This is well within technical possibility.

I feel this isn't an accurate description of Wikipedia ephemerality.

It is certainly true that Wikipedia articles can be edited without
warning at any time, and some edits are detrimental according to your
biases, and maybe mine, but not some other party's.  But the content is
usually self correcting, somehow.

But the Wikipedia is not ephemeral!  There are several projects that
make the entire Wikipedia available for download, or (even better) as a
Semantic Web database (aka RDF) that can be used as a linked reference.
See for example dbpedia.org.

So anyone who wants to do so, and who has sufficient computer resources
and bandwidth, can download the entire wikipedia and burn it to dvd.
Exact estimates are hard to come by, but I guess the entire current
English Wikipedia is a mere several tens of gigabytes.  An external 2
terabyte disk drive these days costs only about $100.

But archiving copies of Wikipedia is not even necessary.  There is the
Wayback Project that periodically crawls the entire web and captures a
snapshot, ideally every few months.  This includes Wikipedia, and saved
pages go back to 2002.  This page records the Wikipedia article on
"Music" from October 2006.

http://web.archive.org/web/20061014042747/http://en.wikipedia.org/wiki/Music

The problem sometimes is not that a page will be lost; it is that
sometimes it won't!

-- 
Steven M. Haflich <smh at franz.com>
academically unaffiliated



More information about the Smt-talk mailing list