How Private Can Electronic Data Ever Be?

R.A. Hettinga rah at shipwright.com
Sun Oct 18 04:20:22 PDT 2009


<http://www.nytimes.com/2009/10/18/business/18stream.html?th=&emc=th&pagewant
ed=print
 >

The New York Times

October 18, 2009
SLIPSTREAM
When 2+2 Equals a Privacy Question
By NATASHA SINGER

TIME to revisit the always compelling  and often disconcerting 
debate over digital privacy. So, what might your movie picks and your
medical records have in common?

How about a potentially false sense of control over who can see your
user history?

While Netflix and some health care concerns say they have been able to
offer study data to researchers stripped of specific personal details
like your name, phone number and e-mail address, in some cases
researchers may be able to re-identify you by correlating anonymous
information with the digital trail that youve left on blogs, chat
rooms and Twitter.

Of course, you may be fine with that. On the other hand, you may not
want complete strangers rummaging around in your history of movie
selections or medical needs.

For example, contestants in Netflixs competition to improve its
recommendation software received a training data set containing the
movie preferences of more than 480,000 customers who had, as they say
in the trade, been de-identified. But as part of a privacy
experiment, a pair of computer scientists at the University of Texas
at Austin decided to see if it was possible to re-identify those
unnamed movie fans.

By comparing the film preferences of some anonymous Netflix customers
with personal profiles on imdb.com, the Internet movie database, the
researchers said they easily re-identified some people because they
had posted their e-mail addresses or other distinguishing information
online.

Vitaly Shmatikov, an associate professor of computer science at the
University of Texas at Austin and a co-author of the de-
anonymization study, says the researchers were able to analyze users
public postings and connect that to their Netflix preferences 
including how a person may have rated films with controversial themes.
Those are choices a person may or may not want to make public, Mr.
Shmatikov said.

Steve Swasey, a Netflix spokesman, disputed the studys conclusions,
saying the customers were not re-identifiable because Netflix had
altered the data set before sending it to contestants.

There is no way with certainty that anyone could link a Netflix
member with the data Netflix has disclosed by linking it with any
publicly available data, he said. The anonymity of the information
is comparable to the strictest federal standards for anonymizing
personal health information.

Nevertheless, the Texas researchers say they were indeed able to
positively identify Netflix customers, and some privacy advocates say
their study raises questions about whether newly strengthened laws
governing the security of electronic health records  which contain
information on diagnoses and treatments entered by health care
providers  may offer incomplete privacy protection. Leaked movie
preferences might embarrass or stereotype you, they said. But
information extracted from medical records and then linked back to
you, they said, has the potential to cause social, professional and
financial harm.

Movie records can be sensitive in some cases; it could be
embarrassing for someone to find out I like romantic comedies, Mr.
Shmatikov, the computer scientist, said in a recent phone interview.
But definitely for health records, this is a huge issue.

And you dont need records containing a persons name and address to
figure out to whom the records belong, he said, As our research
shows, pretty much any information that distinguishes one person from
another can be used to re-identify records.

The idea of an entirely paperless medical system holds the promise of
more efficient and cost-effective care. And, with the incentive of
stimulus package money, many companies are rushing to sell clinical
information systems to streamline services like patient scheduling,
sample tracking, and billing at hospitals and clinics.

In some cases, the same companies that sell data management systems to
hospitals and physicians also store that information and then
repackage it to make money on other services.

The clinical information systems market in the United States has sales
of $8 billion to $10 billion annually, and about 5 percent of that
comes from data and analysis, according to estimates by George Hill,
an analyst at Leerink Swann, a health care investment bank.

But by 2020, when a vast majority of American health providers are
expected to have electronic health systems, the data mining component
alone could generate sales of up to $5 billion, Mr. Hill said. Demand
for the data is likely to be robust. Policy makers and hospitals will
want to dig into it to analyze physician practices and glean
information about patient health trends.

Big players like the Cerner Corporation, which maintains electronic
health systems for 8,000 clients, including large hospitals and retail
clinics, and smaller players like Practice Fusion, which offers its
Web-based health record systems free to health care providers, say
they make use of patient data collected from their clients.

A spokeswoman for Cerner, whose Web site promotes its data mining of
our vast warehouse of electronic health records, said the company
shares de-identified patient data with researchers or drug companies
looking for patients to participate in clinical trials. The patient
records are double scrubbed, she said, explaining that the company
removes personal data like names and addresses before it runs a search
using a numbered code for each patient.

Other sensitive information, like mental health records, might be
removed before the patient data is sent out, she said.

The Web site of Practice Fusion, meanwhile, quotes Ryan Howard, the
chief executive, as saying that the company subsidizes its free record-
keeping systems by selling de-identified data to insurance groups,
clinical researchers and pharmaceutical companies. In an interview,
however, Mr. Howard said Practice Fusion had not yet started selling
patient information but that it intended to do so.

NEW regulations require notifying patients if their personally
identifiable medical information gets loose, and they prohibit selling
protected health records. But privacy advocates said electronic health
records remain vulnerable because no federal law now forbids the sale
of de-identified health care data.

In 1997, for example, a researcher identified the medical records of
William Weld, then the governor of Massachusetts, by correlating
birthdays, ZIP codes and gender in voter registration rolls and
information published by the states government insurance commission.

There are no current federal laws against re-identification, said Dr.
Deborah Peel, a psychiatrist who is a director of Patient Privacy
Rights, a nonprofit watchdog group in Austin, Tex.

Once personal health data gets out there, its like the Paris Hilton
sex tape, Dr. Peel said. It is going to be out there forever.





More information about the cypherpunks-legacy mailing list