Privacy Isn't Dead, or At Least It Shouldn't Be

14 Jul 2007

      <http://sciam.com/print_version.cfm?articleID=6A2EF194-E7F2-99DF-3323DA6BA4346B0B>

Scientific American

June 27, 2007

Privacy Isn't Dead, or At Least It Shouldn't Be: A Q&A with Latanya Sweeney

In a post-9/11 world, where security demands are high, personal privacy
does not have to be sacrificed, says computer scientist Latanya Sweeney,
who discusses a few ways to save it.

By Chip Walter

As security concerns mount, networks proliferate and ever more data move
online, personal privacy and anonymity are often the first casualties. For
the Insights story, "A Little Privacy, Please," appearing in the August
2007 issue of Scientific American, Chip Walter sat down with Carnegie
Mellon computer scientist Latanya Sweeney, who discusses the new threats to
privacy and ways to fight identity theft and other misuse of information.

Why is privacy versus security becoming such a problem? Why should we even
care?

(Laughs) Well, one issue is we need privacy. I don't mean political issues.
We literally can't live in a society without it. Even in nature animals
have to have some kind of secrecy to operate. For example, imagine a lion
that sees a deer down at a lake and it can't let the deer know he's there
or [the deer] might get a head start on him. And he doesn't want to
announce to the other lions [what he has found] because that creates
competition. There's a primal need for secrecy so we can achieve our goals.

Privacy also allows an individual the opportunity to grow and make mistakes
and really develop in a way you can't do in the absence of privacy, where
there's no forgiving and everyone knows what everyone else is doing. There
was a time when you could mess up on the east coast and go to the west
coast and start over again. That kind of philosophy was revealed in a lot
of things we did. In bankruptcy, for example. The idea was, you screwed up,
but you got to start over again. With today's technology, though, you
basically get a record from birth to grave and there's no forgiveness. And
so as a result we need technology that will preserve our privacy.

How did you get into this line of work? What drew you to mathematics and
computer science?

When I was a kid [in about third or fourth grade], one of my earliest
memories, was wanting to build a black box that could learn as fast as I
could. We could learn together, or it could teach me. I wanted the sort of
teaching-learning experience that could go as fast and as deep as I could
go.

What triggered this black box fantasy?

In hindsight, I think I was bored in school, because I would finish the
assignments and would have to wait for the rest of the class. I think it
was an outlet and I began spending hours fantasizing about this box. It
[eventually] became a real passion & so when I went on to high school and
took my first computer course that childhood vision and this sort of
natural interest with computer programming just melded together .

After high school, you went to M.I.T. How did it feel being one of the few
girls in a predominantly male college?

I first went to M.I.T. in 1977. But it was a tough adjustment. I came from
a top-notch prep school for girls and going from that environment to
M.I.T.--well, it's almost impossible to be more opposite, in every possible
way. It was huge, it was in the city, I couldn't sleep it was [so] loud. Oh
man.

But the thing that really made it hard for me was the faculty; I had a lot
of incidents with the faculty that were really obnoxious.

What kind of obnoxious incidents are you talking about?

The way M.I.T. is structured is that in your freshman year the lectures are
in huge halls with over 100 students, and then you go into smaller groups
on the same subject that had only 10 to 12 students. Every week there were
10 problems in a problem set. So the guys [in our group] came to me and
said "look we're going to start a study group' and I said, "what's a study
group?" And they said, "well, every week we're going to get 10 problems,
and every week one of us will get assigned a problem and the day before the
homework is due, we'll all meet and your job is to tell the rest of the
group your solution and then they don't copy it down, you explain it to
them and they go write it up themselves, or if we don't think that's the
right solution, we'll discuss it." And I said, "Oh okay. That sounds good
to me."

ADVERTISEMENT

Basically you had 10 people turning in the same assignment. So everyone
gets the assignments back and they got 10 out of 10, 10 out of 10, 10 out
of 10, seven out of 10. So I go to the instructor and I ask him, "Why would
you give me seven out of 10?" And he says, "Well you didn't show enough of
the work to show the process." So [I try again and] scores are 10 out of
10, 10 out of 10, 10 out of 10, seven out of 10. I ask the instructor
again, "Why did I get seven out of 10?" And he says, "Well you showed so
much detail that it seems you didn't really understand the concepts." So
then I go through this thing where I'm trying to get the right amount of
detail.

Did you ever figure out the real reason behind those seven out of 10s?

One day we had these resistance cubes -- this is an engineering class --
and they have colors on them and the colors tell how much resistance is
inside the canister. We had to memorize these color codes. So in one class
the instructor says, "The way I remembered the resistor [color codes] when
I was a lad at M.I.T. was the following: 'Black boys rape only young girls
but Violet gives willingly." When he said that I think I understood the
[reason behind the] seven out of 10.

Later I left M.I.T. and started my own computer company for 10 years. Then
I went to Harvard, and then went from Harvard to back M.I.T. as a graduate
student.

What was it like to go back to the same department?

When I returned, that teacher was the head of the department. And it was
really funny, but you know what, when I got back my attitude was "I'm not
taking any crap." And I had absolutely no problems in my graduate career.
But in my undergraduate years I was definitely not prepared for what I had
to deal with there.

And now you are head of the Data Privacy Lab at Carnegie Mellon University.
Why did you create it?

One day I was in grad school and [in my research] I came across this letter
that roughly said: at the age of two the patient was sexually molested, at
the age of three she stabbed her sister with scissors, at the age of four,
her parents got divorced, at the age of five she set fire to her home. And
then I realized there was nothing in that description that [would be
changed by] scrubbing out identifiable information. I'll bet you there's
only one person with that experience. And that made me realize that
identifiability is really fascinating, and it made me realize that I didn't
understand a thing about privacy. Removing the explicit identifiers wasn't
what it was about. I realized there's a lot more to this than a notion of
what makes me identifiable.

And it was then that I started realizing that privacy in the data space is
a little bit different. It requires tracking people where they go. And when
all this technology began exploding, you begin to realize that it's huge.

So what makes your lab different from others that look into these issues?

I started the lab to do what I call "research by fire." We don't operate
like a think tank or tackle abstract problems. If you have a real world
crisis, you can come to our lab, give us a little money, and we will solve
your problem. But because these are real world problems, it really is
research by fire. We don't have the luxury of sitting back and speculating
and thinking. The judge needs a decision and an answer now, otherwise
so-and-so is going to sue. So companies and government agencies give us
grants as partners in the lab and they give us problems that need solutions
within a given time period and the goal is to solve those problems.

What kinds of problems do you tackle?

All kinds, from DNA privacy, video piracy to problems with losing revenue
streams, being sued or filing suit. A lot of the technologies [we have
developed] came from that sort of work.

We roll up our sleeves and ask "How do I learn really sensitive
information? How do I exploit the heck out of the data that seems so
innocent out there?" And if we're really good at doing that then we can
create strategies for controlling privacy abuse.

When a problem comes to us, whether it's bioterrorism or something else, we
find ourselves doing a deep dive into that policy setting or regulatory
environment, usability issues and even the business issues. We have to take
on all of these constraints and come up with a solution, and often that's a
new technology, sometimes it's just a patch, very rarely is it just a
recommendation. And that's what we do.

Your Identity Angel software is able to gather up disconnected pieces of
information about people from data available all over the Internet. How
does it work?

It is very easy to do scans for individuals from information that is
publicly available or freely given away or sold for a cost. That means you
don't have to break into a system to get data you're not supposed to have;
it means you can gather the information from what is already out there.

[Earlier in my career] I had learned that if I had the date of birth,
gender and a five-digit zip code of a person, I could identify 87 percent
of the people in the United States. So even if you don't give me your
social security number, I can find out who you are nearly nine out of 10
times.

That led to Identity Angel?

One of the things we suspected at the lab was that people in their early
20s with credit cards were especially vulnerable to identity theft. Our lab
began looking at this, and we found that that is a time in people's lives
when they're not very stable. Their addresses are changing continuously,
and so if you were to [steal an identity and] get a credit card in their
name, the fact that an address had changed was not something that would
trigger a red flag.

What else makes 20-somethings especially vulnerable to identity theft?

The other thing is that they don't have a lot of prior credit records, and
credit card companies are really anxious to give them credit cards. At the
same time there is a lot of information about them on the internet since
they're in that age group where they are used to creating web pages on
Facebook and MySpace. A lot of the information also came from students
routinely releasing their information by putting it in their risumis. Why
would anybody put a social security number on his or her risumi? But they
did.

All of this simplified creating a fraudulent student credit card -- a name
and address & a social security number, and date of birth. The challenge of
Identity Angel was to find and combine this information from the internet .
It mines information including resumes off the Internet and looks for ones
that have the information, social security number, date of birth, etc. --
enough information to get a credit card in the person's name.

What does Identity Angel do with that mined information?

If it succeeds, the software then tries to find an email address and send
[the victim] an email letting them know we found this information.

You also developed a program called k-anonymity. What would be an example
of its use?

We have a project with the Department of Housing and Urban Development.
They want to know where people have been without knowing who they are. And
in this case, they never want to know who they are. So I built this system
that allows them to do this. It is actually tracking the homeless. Congress
appropriated a large amount of money in 2004 to create the Homeless
Management Information System. And the idea of the system is to track the
service utilizations of the homeless, and that's because there are a whole
lot of questions about homelessness and they want a system that gathers
that information.

Congress says this is about money. The cost of homelessness is exploding.
Is this because there are too many homeless, because they are eating too
much food, or because there is fraud in the system? What's actually
happening?

Why do homeless people need privacy protection?

There is one special class of homelessness for which privacy became
critical, and that's domestic violence victims, and it turns out they
account for a huge percentage of the dollars spent in the system. They are
afraid that the person stalking them,So they wanted to be able to track
people, but do it in a way that even if you knew all of the intimate
details about that person, even if you got access to the data, you still
couldn't identify that person.

This required a deep dive into cryptography. The earlier "scrub system" I
had developed [ such as Identity Angel] was all about text. That's just
text mining. But this led us into different areas - video, face
identification, etc. which is a deep dive into computer graphics and
computer vision.

So how do we solve the privacy problem? What are the best and worst-case
scenarios?

My answer is that the privacy problems that I've seen are probably best
solved by the person who first created the technology. What we really have
to do is train engineers and computer scientists to design and build
technologies in the right kind of way from the beginning.

Normally, engineers and computers scientists get ideas for technologies on
their own and engage in a kind of circular thinking and develop a prototype
of their solution and then do some kind of testing. But we are saying we
will give them tools that help them see who are the stakeholders and do a
risk assessment, and then see what barriers will come up and deal with the
riskiest problems and work to solve them in the technology design.

I think if we are successful in producing a new breed of engineers and
computer scientists, society will really benefit. The whole
technology-dialectics thing is really aiming at how you should go about
teaching engineers and computer scientists to think about user acceptance
and social adoption [and also that they] have to think about barriers to
technology [from the beginning].

So the best scenario is that this kind of training takes hold and as new
technologies emerge they are less likely to be constantly clashing with
accept-or-reject options.

Is it hard to break down those cultural barriers and change the way people
work?

There should be privacy technology departments, because there are no
technologies for handling privacy problems [proactively]. The best
solutions lie in the technology design. So we are targeting the creation of
tools for the engineers and computer scientists, to give them software
tools that help them work in a way they are already used to working and
give them a way to gather all of the right information and then infuse it
in their designs.

And a lot of the time, the financial model isn't there to do it. Sometimes
society gets so annoyed at what happens, and it ends up on the front page
of the New York Times. The reaction isn't always rational. Policy doesn't
have the nuances of the technology.

If we build the right designs in up front, then society can decide how to
turn those controls on and off. But if the technology is built without
controls, it forces us to either accept the benefits of the technology
without controls, or cripple it by adding them later.

Several years ago, Scott McNealy, the CEO of Sun Microsystems, famously
quipped, "Privacy is dead. Get over it."

Oh privacy is definitely not dead. When people say you have to choose, it
means they haven't actually thought the problem through or they aren't
willing to accept the answer.

Remember, it's in [McNealy's] interest to say that, because he very much
shares that attitude of the computer scientist who built the technology
that's invasive; who says, "Well, you want the benefits of my technology,
you'll get over privacy". It's exactly the kind of computer scientist we
don't want to be graduating in the future. We want the computer scientist
who will resolve these kinds of barriers in conflict, identify them and
resolve them in their technology design.

So where do you see the big problems?

It really is pretty much everywhere. Identity management is a critical
problem that we just keep ignoring. Social security numbers are a whole
discussion unto themselves -- how they've outplayed themselves, do they
need to be replaced? Now in law enforcement and the department of justice,
they're saying it should be fingerprints. So now we'll see little devices
in computers and cars and even refrigerators with very expensive
fingerprint readers on them. But that's a problem because fingerprints
could become the next social security number. They could give us all the
ills of the social security number and worse. I can't get rid of my
fingerprint, it goes with me wherever I go. I don't wear my social security
number on my head.

How would it be stolen, though? What specifically would you see as the
problem with fingerprints?

Well, we leave them everywhere, which is really good for law enforcement
because they know where to find us at all times, but that means that anyone
could pick them up. The point is you can see the progression. Fingerprint
databases will proliferate all over and that will create problems. Someone
could access the database and replicate your fingerprint and make a card,
which wouldn't really be their card, it'd be yours.

So this leaves more and more bits of data about all of us out there on the
Internet, including email?

Yes, and you can tell a lot about a person that way, you can even
impersonate them. That's another thing I expect to see over the next five
years. Thieves will do a little research on you, impersonate you and maybe
send an email to someone you know to elicit funds because now they have
more information about you.

Medical privacy is a sensitive area, too.

The big vulnerabilities there come from the insurance companies and
employers, people who ultimately pay the medical bills. Those parties have
an interest in knowing what you have been diagnosed with and making
decisions that impact your employment or income. There was an article
written about a banker in Maryland who used to cross a cancer registry with
people who had loans and mortgages with his bank, and would then call in
those loans. Now the story was retracted, because it was being debated
whether it was true or not. But the person who was responsible for the
story showed me lots of documentation that showed it was true. But the
point I am making is that true or false, it is certainly easy to do. And
you can see the financial incentives. So the problem with Scott McNealy's
approach, the trusted agent approach, is that to the extent that they are
the only party to see the data, maybe society can trust them. But the truth
is & you aren't the only party [who can get your hands on the information],
and what you are advocating is not just for you but a lot of parties that
you simply can't be accountable for.

DNA data is becomng more widely available now. If you only have the DNA of
a person and nothing else, could you find out who that person is?

In one project, we chose to look at patients with Huntington's disease,
because it's easy to spot in the DNA. One part of the DNA will repeat, and
that's normal, but if you have Huntington's it repeats a huge amount of
times. Also the longer the repeat, the earlier the age of onset of the
disease. So we could make a prediction about the person's age at the time
they were diagnosed with it. These were all Huntington's patients in the
state of Illinois. We then used hospital discharge information that was
publicly available and looked for diagnoses of [discharged] Huntington's
patients and began to match them to the DNA to identify those people. We
successfully matched 20 out of 22 people. That was shocking.

Are we postponing the privacy problem, or are we confronting it?

A lot of the surveillance can be done with privacy protections. But under
the current administration, those in Homeland Security call it the 'P
word'. Their statement is that as long as you don't say the P word you
don't have a P problem, whether you do or you don't. So the FBI gets
slapped in the wrist for gathering all of this additional data, but a lot
of that could have been anonymized. But right now, there is no funding or
interest in using these technologies at all.

-- 
-----------------
R. A. Hettinga <mailto: rah@ibuc.com>
The Internet Bearer Underwriting Corporation <http://www.ibuc.com/>
44 Farquhar Street, Boston, MA 02131 USA
"... however it may deserve respect for its usefulness and antiquity,
[predicting the end of the world] has not been found agreeable to
experience." -- Edward Gibbon, 'Decline and Fall of the Roman Empire'

Privacy Isn't Dead, or At Least It Shouldn't Be

R.A. Hettinga