Using some crypto to make gov't dataset identifiers better

Fri Mar 21 11:17:02 PDT 2014

So this is a little different from the usual fare here, but my colleague
Tom Lee at the Sunlight Foundation has been thinking about using basic
cryptographic concepts to convince governments to publish more unique
identifiers in their datasets -- even when the identifiers they have in
their *databases* is sensitive (like SSNs).

The problem of anonymizing unique data is in some senses easier than others
here, because in some gov't contexts, making things personally identifiable
isn't the problem -- the *intent* is to publish personally identifiable,
connect-able information, like for campaign donors and lobbyists. So the
Mosaic Effect (de-anonymizing Netflix data) is less of a concern. Depends
on the problem, though.

After talking about it on a
Tom blogged it up:

Your feedback would be very welcome, either here or in public fora. Of
course, convincing government agencies to actually do this sort of thing
might be a challenge, but there's a lot of levels and branches of
government out there - you never know who might lead the way.

