Privacy Isn't Dead, or At Least It Shouldn't Be: A Q&A with Latanya Sweeney

Fri Jul 13 11:43:58 PDT 2007

http://www.sciam.com/print_version.cfm?articleID=6A2EF194-E7F2-99DF-3323DA6BA4346B0B

June 27, 2007

Privacy Isn't Dead, or At Least It Shouldn't Be: A Q&A with Latanya Sweeney

In a post-9/11 world, where security demands are high, personal privacy does
not have to be sacrificed, says computer scientist Latanya Sweeney, who
discusses a few ways to save it.

By Chip Walter

As security concerns mount, networks proliferate and ever more data move
online, personal privacy and anonymity are often the first casualties. For
the Insights story, "A Little Privacy, Please," appearing in the August 2007
issue of Scientific American, Chip Walter sat down with Carnegie Mellon
computer scientist Latanya Sweeney, who discusses the new threats to privacy
and ways to fight identity theft and other misuse of information.

Why is privacy versus security becoming such a problem? Why should we even
care?

(Laughs) Well, one issue is we need privacy. I don't mean political issues.
We literally can't live in a society without it. Even in nature animals have
to have some kind of secrecy to operate. For example, imagine a lion that
sees a deer down at a lake and it can't let the deer know he's there or [the
deer] might get a head start on him. And he doesn't want to announce to the
other lions [what he has found] because that creates competition. There's a
primal need for secrecy so we can achieve our goals.

Privacy also allows an individual the opportunity to grow and make mistakes
and really develop in a way you can't do in the absence of privacy, where
there's no forgiving and everyone knows what everyone else is doing. There
was a time when you could mess up on the east coast and go to the west coast
and start over again. That kind of philosophy was revealed in a lot of things
we did. In bankruptcy, for example. The idea was, you screwed up, but you got
to start over again. With today's technology, though, you basically get a
record from birth to grave and there's no forgiveness. And so as a result we
need technology that will preserve our privacy.

How did you get into this line of work? What drew you to mathematics and
computer science?

When I was a kid [in about third or fourth grade], one of my earliest
memories, was wanting to build a black box that could learn as fast as I
could. We could learn together, or it could teach me. I wanted the sort of
teaching-learning experience that could go as fast and as deep as I could go.

What triggered this black box fantasy?

In hindsight, I think I was bored in school, because I would finish the
assignments and would have to wait for the rest of the class. I think it was
an outlet and I began spending hours fantasizing about this box. It
[eventually] became a real passion & so when I went on to high school and
took my first computer course that childhood vision and this sort of natural
interest with computer programming just melded together .

After high school, you went to M.I.T. How did it feel being one of the few
girls in a predominantly male college?

I first went to M.I.T. in 1977. But it was a tough adjustment. I came from a
top-notch prep school for girls and going from that environment to
M.I.T.--well, it's almost impossible to be more opposite, in every possible
way. It was huge, it was in the city, I couldn't sleep it was [so] loud. Oh
man.

But the thing that really made it hard for me was the faculty; I had a lot of
incidents with the faculty that were really obnoxious.

What kind of obnoxious incidents are you talking about?

The way M.I.T. is structured is that in your freshman year the lectures are
in huge halls with over 100 students, and then you go into smaller groups on
the same subject that had only 10 to 12 students. Every week there were 10
problems in a problem set. So the guys [in our group] came to me and said
"look we're going to start a study group' and I said, "what's a study group?"
And they said, "well, every week we're going to get 10 problems, and every
week one of us will get assigned a problem and the day before the homework is
due, we'll all meet and your job is to tell the rest of the group your
solution and then they don't copy it down, you explain it to them and they go
write it up themselves, or if we don't think that's the right solution, we'll
discuss it." And I said, "Oh okay. That sounds good to me."

Basically you had 10 people turning in the same assignment. So everyone gets
the assignments back and they got 10 out of 10, 10 out of 10, 10 out of 10,
seven out of 10. So I go to the instructor and I ask him, "Why would you give
me seven out of 10?" And he says, "Well you didn't show enough of the work to
show the process." So [I try again and] scores are 10 out of 10, 10 out of
10, 10 out of 10, seven out of 10. I ask the instructor again, "Why did I get
seven out of 10?" And he says, "Well you showed so much detail that it seems
you didn't really understand the concepts." So then I go through this thing
where I'm trying to get the right amount of detail.

Did you ever figure out the real reason behind those seven out of 10s?

One day we had these resistance cubes -- this is an engineering class -- and
they have colors on them and the colors tell how much resistance is inside
the canister. We had to memorize these color codes. So in one class the
instructor says, "The way I remembered the resistor [color codes] when I was
a lad at M.I.T. was the following: 'Black boys rape only young girls but
Violet gives willingly." When he said that I think I understood the [reason
behind the] seven out of 10.

Later I left M.I.T. and started my own computer company for 10 years. Then I
went to Harvard, and then went from Harvard to back M.I.T. as a graduate
student.

What was it like to go back to the same department?

When I returned, that teacher was the head of the department. And it was
really funny, but you know what, when I got back my attitude was "I'm not
taking any crap." And I had absolutely no problems in my graduate career. But
in my undergraduate years I was definitely not prepared for what I had to
deal with there.

And now you are head of the Data Privacy Lab at Carnegie Mellon University.
Why did you create it?

One day I was in grad school and [in my research] I came across this letter
that roughly said: at the age of two the patient was sexually molested, at
the age of three she stabbed her sister with scissors, at the age of four,
her parents got divorced, at the age of five she set fire to her home. And
then I realized there was nothing in that description that [would be changed
by] scrubbing out identifiable information. I'll bet you there's only one
person with that experience. And that made me realize that identifiability is
really fascinating, and it made me realize that I didn't understand a thing
about privacy. Removing the explicit identifiers wasn't what it was about. I
realized there's a lot more to this than a notion of what makes me
identifiable.

And it was then that I started realizing that privacy in the data space is a
little bit different. It requires tracking people where they go. And when all
this technology began exploding, you begin to realize that it's huge.

So what makes your lab different from others that look into these issues?

I started the lab to do what I call "research by fire." We don't operate like
a think tank or tackle abstract problems. If you have a real world crisis,
you can come to our lab, give us a little money, and we will solve your
problem. But because these are real world problems, it really is research by
fire. We don't have the luxury of sitting back and speculating and thinking.
The judge needs a decision and an answer now, otherwise so-and-so is going to
sue. So companies and government agencies give us grants as partners in the
lab and they give us problems that need solutions within a given time period
and the goal is to solve those problems.

What kinds of problems do you tackle?

All kinds, from DNA privacy, video piracy to problems with losing revenue
streams, being sued or filing suit. A lot of the technologies [we have
developed] came from that sort of work.

We roll up our sleeves and ask "How do I learn really sensitive information?
How do I exploit the heck out of the data that seems so innocent out there?"
And if we're really good at doing that then we can create strategies for
controlling privacy abuse.

When a problem comes to us, whether it's bioterrorism or something else, we
find ourselves doing a deep dive into that policy setting or regulatory
environment, usability issues and even the business issues. We have to take
on all of these constraints and come up with a solution, and often that's a
new technology, sometimes it's just a patch, very rarely is it just a
recommendation. And that's what we do.

Your Identity Angel software is able to gather up disconnected pieces of
information about people from data available all over the Internet. How does
it work?

It is very easy to do scans for individuals from information that is publicly
available or freely given away or sold for a cost. That means you don't have
to break into a system to get data you're not supposed to have; it means you
can gather the information from what is already out there.

[Earlier in my career] I had learned that if I had the date of birth, gender
and a five-digit zip code of a person, I could identify 87 percent of the
people in the United States. So even if you don't give me your social
security number, I can find out who you are nearly nine out of 10 times.

That led to Identity Angel?

One of the things we suspected at the lab was that people in their early 20s
with credit cards were especially vulnerable to identity theft. Our lab began
looking at this, and we found that that is a time in people's lives when
they're not very stable. Their addresses are changing continuously, and so if
you were to [steal an identity and] get a credit card in their name, the fact
that an address had changed was not something that would trigger a red flag.

What else makes 20-somethings especially vulnerable to identity theft?

The other thing is that they don't have a lot of prior credit records, and
credit card companies are really anxious to give them credit cards. At the
same time there is a lot of information about them on the internet since
they're in that age group where they are used to creating web pages on
Facebook and MySpace. A lot of the information also came from students
routinely releasing their information by putting it in their risumis. Why
would anybody put a social security number on his or her risumi? But they
did.

All of this simplified creating a fraudulent student credit card -- a name
and address & a social security number, and date of birth. The challenge of
Identity Angel was to find and combine this information from the internet .
It mines information including resumes off the Internet and looks for ones
that have the information, social security number, date of birth, etc. --
enough information to get a credit card in the person's name.

What does Identity Angel do with that mined information?

If it succeeds, the software then tries to find an email address and send
[the victim] an email letting them know we found this information.

You also developed a program called k-anonymity. What would be an example of
its use?

We have a project with the Department of Housing and Urban Development. They
want to know where people have been without knowing who they are. And in this
case, they never want to know who they are. So I built this system that
allows them to do this. It is actually tracking the homeless. Congress
appropriated a large amount of money in 2004 to create the Homeless
Management Information System. And the idea of the system is to track the
service utilizations of the homeless, and that's because there are a whole
lot of questions about homelessness and they want a system that gathers that
information.

Congress says this is about money. The cost of homelessness is exploding. Is
this because there are too many homeless, because they are eating too much
food, or because there is fraud in the system? What's actually happening?

Why do homeless people need privacy protection?

There is one special class of homelessness for which privacy became critical,
and that's domestic violence victims, and it turns out they account for a
huge percentage of the dollars spent in the system. They are afraid that the
person stalking them,So they wanted to be able to track people, but do it in
a way that even if you knew all of the intimate details about that person,
even if you got access to the data, you still couldn't identify that person.

This required a deep dive into cryptography. The earlier "scrub system" I had
developed [ such as Identity Angel] was all about text. That's just text
mining. But this led us into different areas  video, face identification,
etc. which is a deep dive into computer graphics and computer vision.

So how do we solve the privacy problem? What are the best and worst-case
scenarios?

My answer is that the privacy problems that I've seen are probably best
solved by the person who first created the technology. What we really have to
do is train engineers and computer scientists to design and build
technologies in the right kind of way from the beginning.

Normally, engineers and computers scientists get ideas for technologies on
their own and engage in a kind of circular thinking and develop a prototype
of their solution and then do some kind of testing. But we are saying we will
give them tools that help them see who are the stakeholders and do a risk
assessment, and then see what barriers will come up and deal with the
riskiest problems and work to solve them in the technology design.

I think if we are successful in producing a new breed of engineers and
computer scientists, society will really benefit. The whole
technology-dialectics thing is really aiming at how you should go about
teaching engineers and computer scientists to think about user acceptance and
social adoption [and also that they] have to think about barriers to
technology [from the beginning].

So the best scenario is that this kind of training takes hold and as new
technologies emerge they are less likely to be constantly clashing with
accept-or-reject options.

Is it hard to break down those cultural barriers and change the way people
work?

There should be privacy technology departments, because there are no
technologies for handling privacy problems [proactively]. The best solutions
lie in the technology design. So we are targeting the creation of tools for
the engineers and computer scientists, to give them software tools that help
them work in a way they are already used to working and give them a way to
gather all of the right information and then infuse it in their designs.

And a lot of the time, the financial model isn't there to do it. Sometimes
society gets so annoyed at what happens, and it ends up on the front page of
the New York Times. The reaction isn't always rational. Policy doesn't have
the nuances of the technology.

If we build the right designs in up front, then society can decide how to
turn those controls on and off. But if the technology is built without
controls, it forces us to either accept the benefits of the technology
without controls, or cripple it by adding them later.

Several years ago, Scott McNealy, the CEO of Sun Microsystems, famously
quipped, "Privacy is dead. Get over it."

Oh privacy is definitely not dead. When people say you have to choose, it
means they haven't actually thought the problem through or they aren't
willing to accept the answer.

Remember, it's in [McNealy's] interest to say that, because he very much
shares that attitude of the computer scientist who built the technology
that's invasive; who says, "Well, you want the benefits of my technology,
you'll get over privacy". It's exactly the kind of computer scientist we
don't want to be graduating in the future. We want the computer scientist who
will resolve these kinds of barriers in conflict, identify them and resolve
them in their technology design.

So where do you see the big problems?

It really is pretty much everywhere. Identity management is a critical
problem that we just keep ignoring. Social security numbers are a whole
discussion unto themselves -- how they've outplayed themselves, do they need
to be replaced? Now in law enforcement and the department of justice, they're
saying it should be fingerprints. So now we'll see little devices in
computers and cars and even refrigerators with very expensive fingerprint
readers on them. But that's a problem because fingerprints could become the
next social security number. They could give us all the ills of the social
security number and worse. I can't get rid of my fingerprint, it goes with me
wherever I go. I don't wear my social security number on my head.

How would it be stolen, though? What specifically would you see as the
problem with fingerprints?

Well, we leave them everywhere, which is really good for law enforcement
because they know where to find us at all times, but that means that anyone
could pick them up. The point is you can see the progression. Fingerprint
databases will proliferate all over and that will create problems. Someone
could access the database and replicate your fingerprint and make a card,
which wouldn't really be their card, it'd be yours.

So this leaves more and more bits of data about all of us out there on the
Internet, including email?

Yes, and you can tell a lot about a person that way, you can even impersonate
them. That's another thing I expect to see over the next five years. Thieves
will do a little research on you, impersonate you and maybe send an email to
someone you know to elicit funds because now they have more information about
you.

Medical privacy is a sensitive area, too.

The big vulnerabilities there come from the insurance companies and
employers, people who ultimately pay the medical bills. Those parties have an
interest in knowing what you have been diagnosed with and making decisions
that impact your employment or income. There was an article written about a
banker in Maryland who used to cross a cancer registry with people who had
loans and mortgages with his bank, and would then call in those loans. Now
the story was retracted, because it was being debated whether it was true or
not. But the person who was responsible for the story showed me lots of
documentation that showed it was true. But the point I am making is that true
or false, it is certainly easy to do. And you can see the financial
incentives. So the problem with Scott McNealy's approach, the trusted agent
approach, is that to the extent that they are the only party to see the data,
maybe society can trust them. But the truth is & you aren't the only party
[who can get your hands on the information], and what you are advocating is
not just for you but a lot of parties that you simply can't be accountable
for.

DNA data is becomng more widely available now. If you only have the DNA of a
person and nothing else, could you find out who that person is?

In one project, we chose to look at patients with Huntington's disease,
because it's easy to spot in the DNA. One part of the DNA will repeat, and
that's normal, but if you have Huntington's it repeats a huge amount of
times. Also the longer the repeat, the earlier the age of onset of the
disease. So we could make a prediction about the person's age at the time
they were diagnosed with it. These were all Huntington's patients in the
state of Illinois. We then used hospital discharge information that was
publicly available and looked for diagnoses of [discharged] Huntington's
patients and began to match them to the DNA to identify those people. We
successfully matched 20 out of 22 people. That was shocking.

Are we postponing the privacy problem, or are we confronting it?

A lot of the surveillance can be done with privacy protections. But under the
current administration, those in Homeland Security call it the 'P word'.
Their statement is that as long as you don't say the P word you don't have a
P problem, whether you do or you don't. So the FBI gets slapped in the wrist
for gathering all of this additional data, but a lot of that could have been
anonymized. But right now, there is no funding or interest in using these
technologies at all.