robust de-anonymization of large datasets (arXiv)

Mon Nov 26 11:50:02 PST 2007

	http://arxiv.org/PS_cache/cs/pdf/0610/0610105v2.pdf

Robust De-anonymization of Large Datasets
(How to Break Anonymity of the Netflix Prize Dataset)

Arvind Narayanan and Vitaly Shmatikov

The University of Texas at Austin

November 22, 2007

Abstract

We present a new class of statistical de-anonymization attacks against high-dimensional micro-data,
such as individual preferences, recommendations, transaction records and so on. Our techniques are
robust to perturbation in the data and tolerate some mistakes in the adversarybs background knowledge.
We apply our de-anonymization methodology to the Netflix Prize dataset, which contains anonymous
movie ratings of 500,000 subscribers of Netflix, the worldbs largest online movie rental service. We
demonstrate that an adversary who knows only a little bit about an individual subscriber can easily
identify this subscriberbs record in the dataset. Using the Internet Movie Database as the source of
background knowledge, we successfully identified the Netflix records of known users, uncovering their
apparent political preferences and other potentially sensitive information.

-- 
Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org
______________________________________________________________
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE