Eckert's first task with the data was to find out if her browsing data was included in the dataset. To do this, she queried the data for the URL linked with her company's login page, which generates a unique ID for each employee. Germany has a population of about 82 million, so the odds that Eckert herself was in browser data collected from 3 million Germans was small. Although it turned out her browser history wasn't in the data set, by querying the data for her company's login page Eckert discovered that a number of her colleagues were in the data by matching the unique login IDs from the company's page to the individuals.
With this information, Eckert would've been able to see her colleagues' entire browsing history for the last month. One of the colleagues included in the dataset was a close friend of hers, and she reached out to him to let him know that she had his browsing history. The question she had was which browser plugin was collecting and selling this data.
To answer this question, Eckert had her colleague delete one browser plugin every hour until he disappeared from the live data. On the seventh plugin, he disappeared. This suggested that the plugin collecting and selling his browser data was, ironically enough, called Web of Trust, which offers "free tools for safe search and web browsing."
The troubling thing about Eckert and Dewes' de-anonymization technique is that it can be used on anyone who has a public social media presence. For their report, Eckert and Dewes focused on Twitter and the German LinkedIn equivalent, Xing, to see if they could use these public profiles to de-anonymize public figures in the data.
When you click on your analytics page on Twitter, this brings you to a URL that includes your public Twitter handle—Xing has a similar feature. This means that Eckert and Dewes were able to query the database for these publicly available Twitter URLs for German politicians.
If the politicians were included in the dataset, the next step was to visit the Twitter profile of the politician and collect a few of the links they had recently posted. By using these links, coupled with the public Twitter URL, Eckert and Dewes were able to pull an individual's entire month-long browsing history from the anonymous dataset.
As Dewes pointed out when he and I spoke at Def Con, it requires an astonishingly small amount of browsing information to identify an individual out of an anonymous dataset of 3 million people. Since everyone's browsing habits are unique, it only takes about 10 website visits to create a "fingerprint" for an individual based on which websites they are visiting and when.
https://motherboard.vice.com/en_us/article/gygx7y/your-anonymous-browsing-da...