web.archive.org Internet archive to open ---google + archeology

Subcommander Bob bob at black.org
Thu Oct 25 07:45:13 PDT 2001


Hey Mitch  --Another part of your permenant record

http://www.latimes.com/news/nationworld/nation/la-102501archive.story
By JOSEPH MENN, Times Staff Writer

SAN FRANCISCO -- An Internet archive containing more text than any
library in history will open its digital doors today, giving researchers
and the public access to just about everything posted on the World Wide
Web over the last five years.

The free archive, created by a San Francisco computer entrepreneur named
Brewster Kahle, allows academics to conduct the electronic equivalent of
archeological digs, rooting through reams of material illustrating the
evolution of the Web and its role in American society.

The Internet Archive, informally called the Wayback Machine, holds more
than 10 billion Web pages dating to 1996, including millions that had
vanished as dot-coms collapsed, big companies scaled back or updated
their offerings, and hobbyist Webmasters lost interest.

Researchers and academics have likened Kahle to a modern-day Andrew
Carnegie, the steel baron who endowed many of the nation's finest
libraries.

"Libraries are dedicated to collecting and making available the
permanent historical record," said Diane Kresh, the Library of Congress'
director for public service collections. She said trolling the Net is as
significant as gathering books or periodicals.

Want to see what the Heaven's Gate cult page looked like before the
group's mass suicide? There it is. Want to see how Yahoo's pages have
changed since 1996? Step this way. Pages published by everyone from
Fortune 500 companies to renegade porn merchants are stashed in the
Internet Archive.

The five-year, multimillion-dollar project has amassed five times as
much text as the Library of Congress, which helped fund the archive
along with Compaq Computer Corp., the National Science Foundation and
the Smithsonian Institution. The more-than 100 terabytes of data are
housed on 300 modified Hewlett-Packard desktop computers in a basement
at San Francisco's Presidio.

The effort to record Internet history has been directed and largely
financed by Kahle, a 41-year-old former supercomputer technologist who
sold one Web firm to America Online and another to Amazon.com.

"The opportunity of our time is to offer universal access to all of
human knowledge," Kahle said Wednesday from his office in the Presidio,
a decommissioned military base near the Golden Gate Bridge. "We're at a
unique point in time to offer universal access to anyone who walks into
a library in Uganda."

The Internet Archive uses automated "bots" to scour the Web. They
capture sites and return what they find to the computers at the
Presidio. The archive updates every two months. Once captured, the sites
are organized chronologically. Users type in a Web address, and the
archive displays versions of that site since 1996.

Sites that require passwords or block bots are not captured. And if
someone objects to their site being copied, the archive removes it.

As smaller, less accessible versions of the archive were being compiled,
Kahle's 30 staffers got a few complaints. After the staff explained that
it wasn't personal, that they were copying everyone's sites, the vast
majority decided they didn't mind, Kahle said.

"Most people say, 'You're crazy, but go for it,' " Kahle said. "People
want to be part of history."

Candidates to use the service, at web.archive.org, include academics,
journalists and researchers.

"It will allow researchers to study the evolution of the Web in a way
that is unprecedented," said research scientist Ed Chi of the Xerox Palo
Alto Research Center. He said Xerox PARC scientists already are working
on new user interfaces based on what the archive showed them about how
people looked for information.

Early on, "we suspect people will go look for their own pages and see if
they can get copies of things that they've lost," Kahle said. "We're not
exactly sure how this is going to be used. We're looking forward to
being surprised."

Like many Internet pioneers, however, Kahle faces unfamiliar risks along
with the opportunities. The Internet Archive may be a massive violation
of copyright law.

"Brewster is taking an extraordinarily personal risk, because this is
potentially a criminal offense," said Lawrence Lessig, an expert on
intellectual property in cyberspace at Stanford University.

Kahle doesn't anticipate getting sued, let alone serving jail time. His
plan is to post whatever he can--and keep the archive growing.

"We're not here to test laws," Kahle said. "We're trying to build a
world we want to live in. The world without a library is a world without
a memory, and that would be tragic."

The legal questions may take years to resolve, Kahle and Lessig said.

Consider the Industry Standard. At least some of that defunct magazine's
articles are back online through Kahle's archive. But shareholder IDG
paid more than $1 million for the Standard's assets, including rights to
those stories. An IDG spokeswoman declined to say whether the company
would ask the archive to drop the articles.

Kahle said he isn't worrying about the hypotheticals. He's more excited
about finding early www.whitehouse.gov pages from 1996 that dealt with
airport safety and bioterrorism.

Even better is what's to come.

"The woman who is going to be elected president in 2024 is in high
school now, and I bet she has a home page," Kahle said. "We have the
future president's home page!"





More information about the cypherpunks-legacy mailing list