GCHQ goes Google

Eugen Leitl eugen at leitl.org
Tue Nov 9 08:16:53 PST 2010


http://www.theregister.co.uk/2010/11/08/gchq_google/

GCHQ goes Google

Net spies turn to MapReduce

By Chris Williams 

Posted in Government, 8th November 2010 13:36 GMT

Britain's digital spies have turned to Google for help making sense of the
floods of data now inundating their powerful computing resources.

GCHQ, the Cheltenham-based signals intelligence agency, is recruiting an
expert on MapReduce, the patented number-crunching technique previously
behind the dominant web search engine.

The agency's new lead researcher on data mining will be responsible for
"developing MapReduce analytics on parallel computing clusters", a job
advertisment reveals.

MapReduce was developed by Google to index billions of web pages across its
cluster of hundreds of thousands of commodity servers. It breaks up
complicated tasks into smaller, easier computing problems that cheap hardware
is capable of solving quickly.

Google patented the technique earlier this year, but it remains free for
other organisations to adopt via Hadoop, an open source project. Originally
described in a 2004 research paper, MapReduce has allowed Google's algorithms
to index a rapidly expanding web while keeping costs down.

GCHQ faces similar a challenge as it gathers more and more raw data from
internet communications, including email, social networks and VoIP.

"Successful data-driven organisations must be able to process, interpret and
rapidly respond to indicators derived from unprecedented volumes of data from
disparate information sources," its recruitment advertisement says.

The Register understands that GCHQ now has a cluster of more than 250,000
commodity servers under its Cheltenham "doughnut" building. In recent years
it has developed this Google-style infrastructure instead of the very
expensive, bespoke supercomputers it used to analyse microwave intercepts
during the Cold War.

While spies are planning research on MapReduce, Google has already moved on
to BigTable, its new distributed database





More information about the cypherpunks-legacy mailing list