At 01:46 PM 12/31/02 -0800, Bill Stewart wrote: ...
The scalability of the problem is much different depending on your goals. If you want to sort through the transcriptions of people who bought drugs and knives and airline tickets but no luggage in an effort to find potential terrorists, that's useless.
But if you've already got a suspect, like a Green Party member who wrote an annoyed letter to the President and threatened to tell her Congresscritter in person what a bad President he is, ...
It's worth pointing out that if you can afford to do the computerized part of this search for your top 16 suspects today, you'll be able to do it for your top thousand suspects in less than ten years, just assuming processing and storage gets cheaper at current rates.... --John Kelsey, kelsey.j@ix.netcom.com
On Wed, 1 Jan 2003, John Kelsey wrote:
It's worth pointing out that if you can afford to do the computerized part of this search for your top 16 suspects today, you'll be able to do it for your top thousand suspects in less than ten years, just assuming processing and storage gets cheaper at current rates....
I think you're being very conservative here. You can package several GBytes of memory and about a TByte worth of EIDE RAID drive into a 1U system with dual GBit Ethernet. A single facility with a redundancy pool of spares could contain 10^3..10^4 nodes, running for about a megabuck/year for juice and air conditioning. 10 PByte of nonvolatile storage and ~40 TByte of RAM accessed by dual CPUs could easily run data mining on the entire Earth's population (in reality only a fraction of it which generates traffic will be of interest), especially if they run custom dbase code out of core, and use nonvolatile storage mostly as libraries. Assuming there are some 100*10^6 users each of them is sending a 1 kByte pure text email/day a single HD drive will hold a day of world's worth of email traffic, uncompressed. Good quality human voice compresses to about 1.5 kByte/s. Above assembly could store about 3 hours of 100 million people jabbering simultanously. You can of course also run voice recognition either in realtime, or do batch processing of selected stuff from the library. That's the theory, no one knows who is running where what.
participants (2)
-
Eugen Leitl
-
John Kelsey