http://www.wayner.org/books/td/ Translucent Databases Do you have personal information in your database? Do you keep files on your customers, your employees, or anyone else? Do you need to worry about European laws restricting the information you keep? Do you keep copies of credit card numbers, social security numbers, or other information that might be useful to identity thieves or insurance fraudsters? Do you deal with medical records or personal secrets? Most database administrators have some of these worries. Some have all of them. That's why database security is so important. This new book, Translucent Databases, describes a different attitude toward protecting the information. Most databases provide elaborate control mechanisms for letting the right people in to see the right records. These tools are well-designed and thoroughly tested, but they can only provide so much support. If someone breaks into the operating system itself, all of the data on the hard disk is unveiled. If a clerk, a supervisor, or a system administrator decides to turn traitor, there's nothing anyone can do. Translucent databases provide better, deeper protection by scrambling the data with encryption algorithms. The solutions use the minimal amount of encryption to ensure that the database is still functional. In the best applications, the personal and sensitive information is protected but the database still delivers the information. Order directly from the publisher and get a free copy of Free for All . "I would like to recommend this book to everyone who is storing sensitive information in their database. Credit card numbers or other private information from customer statistics data can fall into the wrong hands and give someone else too valuable insights in specific customers behavior." -- Michael Widenius, MySQL Now order from Amazon.com ----------------------------------- Translucent Databases contains several dozen examples written in basic SQL and Java. The code is written to be easy-to-follow and portable. All of the code can be extended and modified to fit a number of different applications. Here are some of the examples: * A database that hides the position of Navy ships from enemies while simultaneously providing accurate information to those with proper authorization. * An anti-rape database that identifies trends without containing any personal information. * A babysitter scheduling service that matches parents with available sitters while protecting the sitters' identities and locations'. * A department store database that guards the modesty of customers. * A private accounting system that detects fraud without revealing information. * A poker game for the Internet that prevents cheating. * A pharmacy database for preventing dangerous drug interactions while keeping medical records secure. * A tool for travel agents to protect their clients from stalkers and kidnappers. * A stock exchange transaction mechanism designed to stop insider-trading. * A website logfile tool that provides accurate counts of visitors while protecting their identities. * A credit-card database for defending crucial e-commerce transactions. * A patent search tool that doesn't reveal the nature and focus of the search. * A conference bulletin board that routes messages without helping stalkers. * A tool for studying the radon concentration in homes without maintaining personal information. * An anti-money laundering database. Anyone who purchases the book receives an unlimited license to use the source code from the examples on up to ten CPUs. If you have greater needs, other licenses are available. Or just buy another copy of the book. ----------------- A Supplementary Syllabus If you're a professor teaching a database course, you may want to use Translucent Databases as an additional textbook. You are welcome to consider this one week module presents some of the most important concepts from Translucent Databases. It consists ofthree parts that roughly correspond to the three hours spent in a classroom in a typical week. Part I -- One-Way Functions * One-way functions are easy to compute but hard to reverse. * Some of the common ones are MD5, SHA, and raising a number to a power modulo a prime number. This section will just use generic one-way functions and call them h(x). There is no reason to do more with advanced mathematics. * Most common one-way functions are not truly impossible to reverse-- they're just practically impossible. Describe how hash functions like MD5 produce their answer. How long does it take to search for a collision? How long does it take to do brute force attack? * Show how to protect passwords using this approach. Anyone can look at the file and anyone can test a password presented as real. But no one can take the password database and work backwards to determine the password * Show how to protect credit cards. (Some systems leave the last four digits in the clear. Mention that this is a hint for how information is treated in Part III.) * Show how multiple people can use h(x) to look up information instead of just x. This can be used to synchronize schedules or protect personal information. * Show how to design a store database that stores h(name) instead of name. * Emphasize that the regular SQL database features still work with the fields of the database that aren't scrambled by h. Part II -- Determining Reality * Digital signatures can use one-way functions. This section won't use the more sophisticated, traditional versions like RSA or Diffie-Hellman, although it could. It will only use simpler versions that are often called Message Authentication Codes. Describe how this is a weaker restriction. * Someone can create a signature or MAC by computing h(password,document). Only someone with the right password can check the signature and see if it was generated by the document. * Show how fake entries in the database can disguise the real ones. * Only someone with the password can distinguish between the real and the fake. Part III -- Blurring Reality with Quantization * Quantization is the act of taking a number from a big set and assigning it the closest value from a smaller subset. * Rounding off values is one form of quantization. * More sophisticated algorithms don't distribute the small set of surrogates evenly over the larger set. * Some basic algorithms block some fields if it makes it too easy to identify the human behind the record. * Other algorithms add random amounts to the data to disguise the true value. * Some encrypt this random amount so some users can get the real values. * Show how this can be applied to medical records used for research. * Show how this can help hide the position of ships. Sample Homework Questions: * Write a program to try random values of x until MD5(x) ends with the sixteen bit value FF. How many random values should it take? Run your program. Do you come close? Repeat this 1000 times and report the average number of samples that must be tested before one is found. Now, extrapolate how long it will take for your computer to completely find an answer that matches a complete 160-bit result from MD-5. * Create a tool for protecting medical records in a trial. Determine which fields to scramble and which fields to leave in the clear. * Describe some possible attacks against the scheduling algorithms described in Chapter 4. * Describe three ideal databases where one-way functions can prevent abuse. Describe several examples where the technique will fail. * Describe three ideal databases where false entries can distract attackers. Describe several cases where the fake entries will corrupt the database. Can this problem be avoided? * Describe three examples where blurring data with quantization can add enough confusion to block attackers. Can you think of examples where too much confusion also confounds the regular users? Are there examples where there's no middle ground? -------- Table of Contents 1--Translucency-- 1.1--Some Examples-- 1.2--Limits-- 1.3--How to Use the Book-- 1.4--Some Examples-- 2--One Way Functions-- 2.1--Pure One-Way Functions-- 2.1.1--Discrete Log-- 2.1.2--The Secure Hash Algorithm or SHA-- 2.1.3--MD-5-- 2.2--Public Key or Trapdoor Function-- 2.3--Secret Key Functions-- 2.3.1--Turning a secret key function into a pure one-way function.-- 2.3.2--Turning One-Way Functions Into Secret-Key Encryption Functions-- 2.4--Implementations-- 2.4.1--MySQL-- 2.4.2--PostgreSQL-- 2.4.3--Oracle-- 2.4.4--Client-side Applications-- 2.5--Conclusions-- 2.5.1--Lessons-- 3--One Way Tables-- 3.1--An Example from a Department Store-- 3.1.1--Adding Security-- 3.2--Cleaning Up One-Way Input-- 3.2.1--Some Java Code-- 3.3--Security Trade Offs-- 3.3.1--Slowing the One-Way Functions-- 3.3.2--Salt-- 3.4--Adding Redundancy-- 3.5--An Example with Encryption for Security-- 3.5.1--Some Java Code-- 3.6--Hashing Instead of Encryption-- 3.7--Serial Queries-- 3.8--Keeping Some Information In the Clear-- 3.8.1--Inserting a Credit Card Number-- 3.8.2--Using the Information-- 3.9--Conclusions-- 3.9.1--Lessons-- 4--Coordinating Users-- 4.1--A Bulletin Board Example-- 4.1.1--Adding a Shared Password-- 4.2--Special One-Way Functions-- 4.2.1--Creating A Public Key-- 4.2.2--Using the Public Key-- 4.2.3--Recovering Messages-- 4.2.4--Using Public-Key One-Way Functions-- 4.3--Conclusion-- 4.3.1--Lessons-- 5--Synchronization-- 5.0.2--The BabySitter's Table-- 5.0.3--Adding More Names-- 5.0.4--Multiple Tables-- 5.0.5--Adding Extra Information-- 5.0.6--Security-- 5.1--Conclusions-- 5.1.1--Lessons-- 6--Evolving Data-- 6.1--An Auction Example-- 6.1.1--The First Bid-- 6.1.2--Adding New Bids-- 6.1.3--Creating Bids-- 6.1.4--The Value of Counter-- 6.1.5--Better Hash Functions-- 6.2--Working With Encryption-- 6.3--Conclusions-- 6.3.1--Lessons-- 7--Sharing-- 7.1--The Algorithms-- 7.1.1--More Precise Algorithms-- 7.1.2--More Efficient Algorithms-- 7.1.3--Adding Sophistication-- 7.2--Nuclear Launch Codes-- 7.2.1--Adding Launch Codes-- 7.2.2--Recovering the Code-- 7.2.3--Adding More Security-- 7.3--A Public-Key Example-- 7.3.1--Adding a Message-- 7.3.2--Retrieving the Message-- 7.4--Conclusions-- 7.4.1--Lessons-- 8--Revelation-- 8.1--A Masquerade-- 8.2--Lottery-- 8.2.1--Paying for the Ticket-- 8.2.2--Placing Bets-- 8.2.3--Testing Winners-- 8.3--Sports Poker and Multiple Columns-- 8.3.1--Inserting Predictions-- 8.3.2--Testing and Verifying-- 8.4--Identity Cards and Selective Revelations-- 8.4.1--The Basic Mathematics-- 8.4.2--A Rental Car Example-- 8.4.3--The License-- 8.4.4--Proving Information-- 8.4.5--The Rental Car Company-- 8.5--Conclusions-- 8.5.1--Lessons-- 9--Quantization-- 9.1--Algorithms-- 9.1.1--Adaptive Quantization-- 9.1.2--Projection-- 9.2--Using Quantization In Databases-- 9.2.1--Adding Random Noise-- 9.2.2--Adding Encryption-- 9.3--Quantized One-Way Functions-- 9.3.1--One-Way Functions and Noise-- 9.4--Conclusions-- 9.4.1--Lessons-- 10--Authentication-- 10.1--Digital Signature Taxonomy-- 10.1.1--One-Way Functions and Signatures-- 10.1.2--Modular Exponentiation and Signatures-- 10.2--Adding Digital Signatures To SQL Databases-- 10.2.1--A Hash-based Signature-- 10.2.2--Signatures Using Exponentiation-- 10.3--Fake Information-- 10.3.1--An Appointment System-- 10.3.2--Adding Entries With Signatures-- 10.3.3--Adding Fake Entries-- 10.3.4--Finding the Results-- 10.3.5--Modifications-- 10.4--Conclusions-- 10.4.1--Lessons-- 11--Accounting-- 11.1--Sales Force Accounting-- 11.1.1--Adding Values-- 11.1.2--Checking Things Out-- 11.2--Conclusions-- 11.2.1--Lessons-- 12--Tokens-- 12.1--Prescription Records-- 12.1.1--Inserting Records-- 12.1.2--A Relatively Fast Mechanism for Retrieval-- 12.1.3--A More Secure Mechanism-- 12.1.4--At the client-- 12.1.5--At the database-- 12.1.6--Using transparency-- 12.1.7--Dealing with the Challenge-- 12.2--Conclusions-- 12.2.1--Lessons-- 13--Private Retrieval-- 13.1--Stock Prices From Multiple Sources-- 13.2--A Single Server Example-- 13.2.1--Using More Decoys-- 13.3--A Patent Example-- 13.4--Conclusions-- 13.4.1--Lessons-- A--Further Reading-- ---------------------- -- ----------------- R. A. Hettinga <mailto: rah@ibuc.com> The Internet Bearer Underwriting Corporation <http://www.ibuc.com/> 44 Farquhar Street, Boston, MA 02131 USA "... however it may deserve respect for its usefulness and antiquity, [predicting the end of the world] has not been found agreeable to experience." -- Edward Gibbon, 'Decline and Fall of the Roman Empire'