Realities of Data Mining / Brokering You and Beyond
https://www.reddit.com/r/news/comments/62fjdr/cards_against_humanity_creator... https://www.reddit.com/user/delftblauw http://www.stopdatamining.me/opt-out-list/ I wanted to counter some notions that this is not personally identifiable information. The data from ISPs itself is not. It is true that you won't be able to call the ISP company and ask for the search history of /u/delftblauw (I am boring anyway), but rather it is how that little slice of data the ISPs provide is part of the whole pie that is your digital profile. I worked in the industry that aggregates and models your data and wanted to give some insight to how this all works for those willing to read this wall of text. I started in IT fresh out of college for a small marketing company in the Midwest that aggregated personal information and demographics to sell lists for companies to market to. We worked with a larger company that provided us a "master list" of persons in the US. This master list was kind of like a digital phone book, with a lot of aggregate demographics attached to the record. All sorts of information was collected in aggregate from private and public sources this larger company gathered. Attributes such as political affiliations are garnered from lists bought from political parties and candidates you donated to. Personal interests from the magazines you subscribe to, even charities and causes you donate to. Of course your age, estimated income, household size, education, etc. are all folded in. The aggregation here works off of semi-anonymous identifiers and keys that are incredibly interconnected digitally such as your home address, email address, phone numbers, and even usernames for niche topics. The aggregation is pulled together from disparate sources. For example, let's say Company A gave us your name and email address along with the fact you are "likely" a democrat since you contributed $10 to Bernie Sanders PAC. Company B is "more secure" so they only provide your email address and the fact that you said you make over $100k on their survey. Company C gives us your name and home address along with the fact that you subscribe to National Geographic. We can query census data to find the average income in your census block based on your address. Also, I see your home address has a lien on it from the HOA dues you didn't pay after linking the public record provided from your county government. I say it's semi-anonymous because on its own these elements don't identify you, and they are subject to change. However, these attribute have varying degrees of permanency. You are far more likely to change your home address than your email address, and you are more likely to have multiple email addresses than you are phone numbers. There is a reason LinkedIn and Facebook Messenger will not shut up about asking for your phone number. Now think of things like Reddit. How many accounts do you have? One, for sure right? Maybe another for a throwaway? Your main account has all that sweet karma though, no way you're going to abandon that. If Reddit sells your username and the email address you registered together (no idea if they do this, this is purely a hypothetical everyone reading here can relate to), all of the sudden the "master list" now has your Reddit username. Tack on the fact that the subreddits you are subscribed to indicates likely interest. For instance, if you are subscribed to a very loud subreddit dedicated to the current US president along with other conservative subreddits, you are likely a Republican. If you are subscribed to those along with Democrat-leaning subreddits, you are likely highly interested in politics in general. My job was to build everything from queries to ANN models to create a profile for you. Given the basic example above, three companies and public data just let me identify or at least make an educated guess linking your name, email address, phone number, address, hobbies, income, and public delinquencies. This is basic, and there is a lot more that goes into this, but it should give a rough idea here. I did it. Facebook does it. Reddit does it. And now your ISPs are/will continue doing it. Now, the way we monetize this is from the aggregate list. The more data we have on you, and the higher confidence we have in that data, the more valuable you are to us. One of our partners was a large website used for researching new car purchases. They would give us their data to build into our models and sell on their behalf. When a car manufacturer would call us and ask for a list of households with 5+ people, income 75k+, and have a car first registered 4+ years ago, and were researching competitor models, we could look at our models and pull back a list of names, email address, phone numbers, etc. they can market to. This is how you get the email that says, "Instead of the CX9, take a look at the all new 2018 Subaru Ascent!" just a few days after you were casually looking at a car. For the digital companies like Facebook, Google, and Reddit, this data is largely self-serving to be able to target ads to you as you use the site. Now that said, this is all absolutely personally identifiable. I routinely queried myself as well as friends and family to pull back their "full profile". Everyone I showed had the same reaction that goes from utter surprise to a mix of embarrassment and vulnerability. I always tried to reassure them that there was nothing "crazy" we could find out, but if I wanted to I could have checked the debt and credit history, political party, and general interests, and gender (no surprises!) of all the dates I had in my 20's. Just joking, I was a data engineer, I didn't have any dates! After I left there, I made sure to opt out of the master list we received from the larger company. There are so many damn companies doing this now it's exhausting. Once your data is out there, it's out there forever. If my old company was the only company that had say, the ability to know if I contributed to the Endangered Toucan Fund, and a company came along asking for a list of people who support endangered wildlife, they could purchase my email address and now all of the sudden my old company just became Company N from the example listed above. I have learned to just accept this as a fact of the digital age. Edit: For those looking to opt-out from the large private data aggregation and mining that is going on all over, here is a good list to start from. It's whack-a-mole though guys and gals. For every one you opt out of, a new start up is niche mining something else. Try and make peace or push public policy.
participants (1)
-
grarpamp