web.archive.org Internet archive to open ---google + archeology
Hey Mitch --Another part of your permenant record http://www.latimes.com/news/nationworld/nation/la-102501archive.story By JOSEPH MENN, Times Staff Writer SAN FRANCISCO -- An Internet archive containing more text than any library in history will open its digital doors today, giving researchers and the public access to just about everything posted on the World Wide Web over the last five years. The free archive, created by a San Francisco computer entrepreneur named Brewster Kahle, allows academics to conduct the electronic equivalent of archeological digs, rooting through reams of material illustrating the evolution of the Web and its role in American society. The Internet Archive, informally called the Wayback Machine, holds more than 10 billion Web pages dating to 1996, including millions that had vanished as dot-coms collapsed, big companies scaled back or updated their offerings, and hobbyist Webmasters lost interest. Researchers and academics have likened Kahle to a modern-day Andrew Carnegie, the steel baron who endowed many of the nation's finest libraries. "Libraries are dedicated to collecting and making available the permanent historical record," said Diane Kresh, the Library of Congress' director for public service collections. She said trolling the Net is as significant as gathering books or periodicals. Want to see what the Heaven's Gate cult page looked like before the group's mass suicide? There it is. Want to see how Yahoo's pages have changed since 1996? Step this way. Pages published by everyone from Fortune 500 companies to renegade porn merchants are stashed in the Internet Archive. The five-year, multimillion-dollar project has amassed five times as much text as the Library of Congress, which helped fund the archive along with Compaq Computer Corp., the National Science Foundation and the Smithsonian Institution. The more-than 100 terabytes of data are housed on 300 modified Hewlett-Packard desktop computers in a basement at San Francisco's Presidio. The effort to record Internet history has been directed and largely financed by Kahle, a 41-year-old former supercomputer technologist who sold one Web firm to America Online and another to Amazon.com. "The opportunity of our time is to offer universal access to all of human knowledge," Kahle said Wednesday from his office in the Presidio, a decommissioned military base near the Golden Gate Bridge. "We're at a unique point in time to offer universal access to anyone who walks into a library in Uganda." The Internet Archive uses automated "bots" to scour the Web. They capture sites and return what they find to the computers at the Presidio. The archive updates every two months. Once captured, the sites are organized chronologically. Users type in a Web address, and the archive displays versions of that site since 1996. Sites that require passwords or block bots are not captured. And if someone objects to their site being copied, the archive removes it. As smaller, less accessible versions of the archive were being compiled, Kahle's 30 staffers got a few complaints. After the staff explained that it wasn't personal, that they were copying everyone's sites, the vast majority decided they didn't mind, Kahle said. "Most people say, 'You're crazy, but go for it,' " Kahle said. "People want to be part of history." Candidates to use the service, at web.archive.org, include academics, journalists and researchers. "It will allow researchers to study the evolution of the Web in a way that is unprecedented," said research scientist Ed Chi of the Xerox Palo Alto Research Center. He said Xerox PARC scientists already are working on new user interfaces based on what the archive showed them about how people looked for information. Early on, "we suspect people will go look for their own pages and see if they can get copies of things that they've lost," Kahle said. "We're not exactly sure how this is going to be used. We're looking forward to being surprised." Like many Internet pioneers, however, Kahle faces unfamiliar risks along with the opportunities. The Internet Archive may be a massive violation of copyright law. "Brewster is taking an extraordinarily personal risk, because this is potentially a criminal offense," said Lawrence Lessig, an expert on intellectual property in cyberspace at Stanford University. Kahle doesn't anticipate getting sued, let alone serving jail time. His plan is to post whatever he can--and keep the archive growing. "We're not here to test laws," Kahle said. "We're trying to build a world we want to live in. The world without a library is a world without a memory, and that would be tragic." The legal questions may take years to resolve, Kahle and Lessig said. Consider the Industry Standard. At least some of that defunct magazine's articles are back online through Kahle's archive. But shareholder IDG paid more than $1 million for the Standard's assets, including rights to those stories. An IDG spokeswoman declined to say whether the company would ask the archive to drop the articles. Kahle said he isn't worrying about the hypotheticals. He's more excited about finding early www.whitehouse.gov pages from 1996 that dealt with airport safety and bioterrorism. Even better is what's to come. "The woman who is going to be elected president in 2024 is in high school now, and I bet she has a home page," Kahle said. "We have the future president's home page!"
Subcommander Bob wrote: <cut>
http://www.latimes.com/news/nationworld/nation/la-102501archive.story By JOSEPH MENN, Times Staff Writer
SAN FRANCISCO -- An Internet archive containing more text than any library in history will open its digital doors today, giving researchers and the public access to just about everything posted on the World Wide Web over the last five years.
Way cool. It needs to be mirrored, though. Single point of failure/distribution invites history being rewritten the way it always has been until now. jbdigriz
Thats fine, I've got a 100TB server in my attic you can use if you want? ;) point taken though. tolan. -----Original Message----- From: owner-cypherpunks@ssz.com [mailto:owner-cypherpunks@ssz.com]On Behalf Of James B. DiGriz Sent: 25 October 2001 17:01 To: cypherpunks@einstein.ssz.com Subject: CDR: Re: web.archive.org Internet archive to open ---google + archeology Subcommander Bob wrote: <cut>
http://www.latimes.com/news/nationworld/nation/la-102501archive.story By JOSEPH MENN, Times Staff Writer
SAN FRANCISCO -- An Internet archive containing more text than any library in history will open its digital doors today, giving researchers and the public access to just about everything posted on the World Wide Web over the last five years.
Way cool. It needs to be mirrored, though. Single point of failure/distribution invites history being rewritten the way it always has been until now. jbdigriz
It's not as outrageous as you'd think. 100GB drives are around $200, which means that a terabyte will cost you about $3K if you throw in a PC and some networking gear to connect it, so you could replicate that in your basement next to your DES-cracker for about the same price - the more expensive problem is getting the fiber optic connection from the Presidio to your basement to keep it updated. More to the point, recent news articles say the Feds have been getting Google to delete things for them. http://www.inet-one.com/cypherpunks/current/msg00505.html Anybody know what's been deleted, and whether it's still in Wayback, and whether we can get copies out into the public before anyone pressures Brewster Kahle? At 06:10 PM 10/25/2001 +0100, Tolan Blundell wrote:
Thats fine, I've got a 100TB server in my attic you can use if you want? ;)
jbdigriz: Way cool. It needs to be mirrored, though. Single point of failure/distribution invites history being rewritten the way it always has been until now.
http://www.latimes.com/news/nationworld/nation/la-102501archive.story By JOSEPH MENN, Times Staff Writer
SAN FRANCISCO -- An Internet archive containing more text than any library in history will open its digital doors today, giving researchers and the public access to just about everything posted on the World Wide Web over the last five years.
participants (4)
-
Bill Stewart
-
James B. DiGriz
-
Subcommander Bob
-
Tolan Blundell