
I ran across an interesting problem on the STAT-L mailing list. I came up with an initial solution, but it didn't fully solve the problem. I will summarize: In medical research (this particular application - there are others I am sure) it is desirable to have a large database of individual medical histories available to search for correlations, risk factors, etc. The problem, of course, is that many individuals want their medical histories kept private. It is therefore necessary to maintain a database that is not traceable back to individuals. An additional requirement is that people must be able to add additional information to their records as it becomes available. The researcher who initially posed the question suggested adding random data to "encrypt anonymity". My first cut solution was to hash the individual's name (perhaps including some other info or random info to thwart dictionary attacks) and send the records in under the hashed name. If done correctly, this should protect the anonymity of the record. The problem with this is that with the volume of data available in a medical record, it is very probable that a person could be tied to that record. Does anyone have any insights into this problem? <disclaimer> This is of purely academic interest to me, I don't know the person who asked the intial question (other than through email). It just sounds like a neat problem. </disclaimer> Clay --------------------------------------------------------------------------- Clay Olbon II | Clay.Olbon@dynetics.com Systems Engineer | ph: (810) 589-9930 fax 9934 Dynetics, Inc., Ste 302 | http://www.msen.com/~olbon/olbon.html 550 Stephenson Hwy | PGP262 public key: on web page Troy, MI 48083-1109 | pgp print: B97397AD50233C77523FD058BD1BB7C0 TANSTAAFL ---------------------------------------------------------------------------