http://www.boston.com/bostonglobe/ideas/articles/2009/09/20/project_gaydar_a... The Boston Globe Project bGaydarb At MIT, an experiment identifies which students are gay, raising new questions about online privacy By Carolyn Y. Johnson Globe Staff / September 20, 2009 It started as a simple term project for an MIT class on ethics and law on the electronic frontier. Two students partnered up to take on the latest Internet fad: the online social networks that were exploding into the mainstream. With people signing up in droves to reconnect with classmates and old crushes from high school, and even becoming online b friendsb with their family members, the two wondered what the online masses were unknowingly telling the world about themselves. The pair werenbt interested in the embarrassing photos or overripe profiles that attract so much consternation from parents and potential employers. Instead, they wondered whether the basic currency of interactions on a social network - the simple act of b friendingb someone online - might reveal something a person might rather keep hidden. Using data from the social network Facebook, they made a striking discovery: just by looking at a personbs online friends, they could predict whether the person was gay. They did this with a software program that looked at the gender and sexuality of a personbs friends and, using statistical analysis, made a prediction. The two students had no way of checking all of their predictions, but based on their own knowledge outside the Facebook world, their computer program appeared quite accurate for men, they said. People may be effectively b outingb themselves just by the virtual company they keep. b When they first did it, it was absolutely striking - we said, bOh my God - you can actually put some computation behind that,b b said Hal Abelson, a computer science professor at MIT who co-taught the course. b That pulls the rug out from a whole policy and technology perspective that the point is to give you control over your information - because you donbt have control over your information.b The work has not been published in a scientific journal, but it provides a provocative warning note about privacy. Discussions of privacy often focus on how to best keep things secret, whether it is making sure online financial transactions are secure from intruders, or telling people to think twice before opening their lives too widely on blogs or online profiles. But this work shows that people may reveal information about themselves in another way, and without knowing they are making it public. Who we are can be revealed by, and even defined by, who our friends are: if all your friends are over 45, youbre probably not a teenager; if they all belong to a particular religion, itbs a decent bet that you do, too. The ability to connect with other people who have something in common is part of the power of social networks, but also a possible pitfall. If our friends reveal who we are, that challenges a conception of privacy built on the notion that there are things we tell, and things we donbt. b Even if you donbt affirmatively post revealing information, simply publishing your friendsb list may reveal sensitive information about you, or it may lead people to make assumptions about you that are incorrect,b said Kevin Bankston, senior staff attorney for the Electronic Frontier Foundation, a nonprofit digital rights organization in San Francisco. b Certainly if most or many of your friends are of a particular religious or political or sexual category, others may conclude you are part of the same category - even if you havenbt said so yourself.b The project, given the name b Gaydarb by the students, Carter Jernigan and Behram Mistree, is part of the fast-moving field of social network analysis, which examines what the connections between people can tell us. The applications run the gamut, from predicting who might be a terrorist to the likelihood a person is happy or fat. The idea of making assumptions about people by looking at their relationships is not new, but the sudden availability of information online means the fieldbs powerful tools can now be applied to just about anyone. For example, Murat Kantarcioglu, an assistant professor of computer science at the University of Texas at Dallas, found he could make decent predictions about a personbs political affiliation. He and a student - who later went to work for Facebook - took 167,000 profiles and 3 million links between people from the Dallas-Fort Worth network. They used three methods to predict a personbs political views. One prediction model used only the details in their profiles. Another used only friendship links. And the third combined the two sets of data. The researchers found that certain traits, such as knowing what groups people belonged to or their favorite music, were quite predictive of political affiliation. But they also found that they did better than a random guess when only using friendship connections. The best results came from combining the two approaches. Other work, by researchers at the University of Maryland, College Park, analyzed four social networks: Facebook, the photo-sharing website Flickr, an online network for dog owners called Dogster, and BibSonomy, in which people tag bookmarks and publications. Those researchers blinded themselves to the profiles of half the people in each network, and launched a variety of b attacksb on the networks, to see what private information they could glean by simply looking at things like groups people belonged to, and their friendship links. On each network, at least one attack worked. Researchers could predict where Flickr users lived; Facebook usersb gender, a dogbs breed, and whether someone was likely to be a spammer on BibSonomy. The authors found that membership in a group gave away a significant amount of information, but also found that predictions using friend links werenbt as strong as they expected. b Using friends in classifying people has to be treated with care,b computer scientists Lise Getoor and Elena Zheleva wrote. The idea behind the MIT work, done in 2007, is as old as the adage that birds of a feather flock together. For years, sociologists have known of the b homophily principleb - the tendency for similar people to group together. People of one race tend to have spouses, confidants, and friends of the same race, for example. Jernigan and Mistree downloaded data from the Facebook network, choosing as their sample people who had joined the MIT network and were in the classes 2007-2011 or graduate students. They were interested in three things people frequently fill in on their social network profile: their gender, a category called b interested inb that they took to denote sexuality, and their friend links. Using that information, they b trainedb their computer program, analyzing the friend links of 1,544 men who said they were straight, 21 who said they were bisexual, and 33 who said they were gay. Gay men had proportionally more gay friends than straight men, giving the computer program a way to infer a personbs sexuality based on their friends. Then they did the same analysis on 947 men who did not report their sexuality. Although the researchers had no way to confirm the analysis with scientific rigor, they used their private knowledge of 10 people in the network who were gay but did not declare it on their Facebook page as a simple check. They found all 10 people were predicted to be gay by the program. The analysis seemed to work in identifying gay men, but the same technique was not as successful with bisexual men or women, or lesbians. b Itbs just one example of how information could be inadvertently shared,b said Jernigan. b It does highlight risks out there.b The researchers treated their data anonymously, never using names except to validate that they provide a resource that brings people together, but also may endanger privacy in ways they did not anticipate. Social networks like Facebook already give people power over that information, with privacy features that allow people to hide their profiles, and even make their list of friends invisible to outsiders, as well as from select friends. Because the features and services offered on social networks are new, they also evolve in response to user demand that may not always be anticipated by the company. In 2007, for example, Facebook introduced Beacon, a feature that broadcasted friendsb activities - such as buying movie tickets on a specific website - like targeted advertisements. That drew an angry response from users concerned about privacy, and prompted an apologetic blog posting from Facebook cofounder Mark Zuckerberg, along with modifications that meant people could opt out. Computer scientists are identifying the ways in which anyone from a potential employer to an advertiser might be able to make informed guesses about a person. But there are limits to online privacy, and ultimately, say some experts, people will simply have to weigh the costs and benefits of living online. b You can do damage to your reputation with social networking data, and other people can do damage to you. I do think that therebs been a very fast learning curve - people are quickly learning the dos and donbts of Internet behavior,b said Jason Kaufman, a research fellow at the Berkman Center for Internet and Society at Harvard University who is studying a set of Facebook data. b Potentially everything you ever do on the Internet will live forever. I like to think webll all learn to give each other a little more slack for our indiscretions and idiosyncrasies.b Carolyn Y. Johnson is a science reporter at the Globe. E-mail cjohnson@globe.com.