Jurisdictionless Distributed Data Havens
With all of the debate on offshore data havens, I've been thinking of a way to implement a distributed data haven that would not be subject to (hopefully) any local jurisdiction and therefore, would not need to be located in the place of cypherpunks wet dreams ;). If this idea has been mentioned by someone else, I am not aware of it. By distributing the data over several servers (using RAID like striping), compromise of a single or multiple servers (depending on implementation specifics) would not cause a collapse of the data haven. Here is a way such a data haven could be setup: The data is split into multiple parts. Each site is responsible for maintaining only its part. For a client to access a piece of data, he will have to contact a certain number of sites to reconstruct the data. The client could also use some sort of anonymizer service to do the data collection and then do then combining on his local machine. If one site gets shut down by Big Brother, denial of service attack, etc., the other sites either find a new site to replicate the missing part to, or they need to reconstruct the data and re-stripe it for the remaining servers. (Read any networking book with a section on RAID to see what all this means.) Another idea is that the data can be encrypted, and the client pays the key-holder for the key. The key holder would preferably be the content provider and not one of the servers. i.e.: Client finds Bombs R Us and wants to buy pipe bomb instructions. Bombs R Us gets anonymous payment from client for the instructions. Bombs R Us says to client: "collect" page 5 "parts" and use this key to decrypt. In this scheme, using anonymous digital cash, Bombs R Us can remain anonymous with his data publicly available but encrypted. He can pay each of the server maintainers in anonymous digital cash as his expense. Each server cannot (or should not) be held responsible for disseminating bomb making instructions because each server does not *have* the instructions in a complete form (encrypted or otherwise). It would be like someone calling the cops and saying, I placed a box with XXXXX in it at the airport. This in itself is not a threat. XXXXX could be anything, including OJ Simpson's bloody clothes. And if another guy called in and said "pipe bomb" and hung up, this is meaningless also. The reason data striping is better than a simple mirroring network is that no single site contains anything useful in itself for the authorities to use against the server maintainer. (Similar to a remailer network perhaps) An extra feature could be if some major attack was initiated against the data haven, there could be a dead-man button of some sort to make the data vanish altogether by sending distress signals to the other servers (or to at least one server, which could then cascade the signal). The system could either use RAID Level 5 data striping, or some hybrid scheme like this: Site 1 has bits 1 and 5 of data Site 2 has bits 2 and 6 of data Site 3 has bits 3 and 7 of data Site 4 has bits 4 and 8 of data Site 5 has 2 parity bits (1 parity bit per nibble) Site 6 has bits 1, 2, 3, and 4 Site 7 has bits 5, 6, 7, and 8 etc. In this particular scheme, it would take the downing of at least 3 of the 7 servers to prevent data collection. It would also take at least 2 sites worth of data to reconstruct the data into a usable form. As much or as little redundancy as needed can be built into the system. The only time the data is in it's whole is prior to being stripped across the servers, during data recreation and re-striping, and when in the hands of the client. Benefits: * No single site contains any incriminating evidence * Allows for a true "virtual company" to exist, with just a mail-box to receive it's anonymous digital cash payments Problems: * Requires a re-striping of data when the data source is changed * Requires a data regeneration and re-striping when a site goes down and a replacement site can not be found (could possibly use an idle standby server to circumvent this problem) Attacks: * Government sabotages site 1, then watches site 2 for data regeneration before re-striping (Can be thwarted by having data regeneration happening on a randomly picked server) Clarifications: When I say digital cash, I am not referring to Chaum's DigiCash. DigiCash will never work in my book because it requires an account, amongst other reasons. YMMV. Comments welcome. Kevin Stephenson -Silence is Security WWIIish Poster
Some annotations: 1. The easiest way to make such a distributeted data havens (DDH) would be to use a distributed unix file system that doesn't distribute files but chunks of bytes. (Striping) Advantage: - All normal services would work: ftp,http,... - Copying, deleting and modifing files. - Easy to install and use. Problems: Because each side can supply all data (collecting on the fly from other DDHs), the site holder could be responsible for the data. This could be prevented by collecting and assembling data at the client side (e.g. using JAVA). 2. When the DDHs are distributed around the world in a lot of different states, it could be very difficult for any government to get any evidence for "illegal" data on one site. -- stephan
Some annotations:
1. The easiest way to make such a distributeted data havens (DDH) would be to use a distributed unix file system that doesn't distribute files but chunks of bytes. (Striping)
Like AFS?
Advantage: - All normal services would work: ftp,http,... - Copying, deleting and modifing files. - Easy to install and use.
Problems: Because each side can supply all data (collecting on the fly from other DDHs), the site holder could be responsible for the data. This could be prevented by collecting and assembling data at the client side (e.g. using JAVA).
How many people trust clients? If we wanted clients, we would use WebStor from Mcaffee. I don't trust any clients specific to one task, and would rather use generic E-mail/ftp/www.
2. When the DDHs are distributed around the world in a lot of different states, it could be very difficult for any government to get any evidence for "illegal" data on one site.
In some countries, when the government jails someone, it doesn't matter what evidence they have. Not everyone has a justice system that at least makes an attempt to give a fair trial.
[...]
If one site gets shut down by Big Brother, denial of service attack, etc., the other sites either find a new site to replicate the missing part to, or they need to reconstruct the data and re-stripe it for the remaining servers. (Read any networking book with a section on RAID to see what all this means.)
[...]
The reason data striping is better than a simple mirroring network is that no single site contains anything useful in itself for the authorities to use against the server maintainer. (Similar to a remailer network perhaps)
An extra feature could be if some major attack was initiated against the data haven, there could be a dead-man button of some sort to make the data vanish altogether by sending distress signals to the other servers (or to at least one server, which could then cascade the signal).
BB would follow the signal and pop another person with conspiracy. I have been researching this for a while yet, and have a pretty alpha reference implementation as well as a mailing list exactly on this topic. The problem with a RAID 5 data haven is that something needs to be the controller, to put together and store/retrieve the data. This controller is in one point, and can be found out. What BB could do is smash the controller of the RAID array, then press charges against several of the "hard drive" owners for conspiracy. I am working on a list for this topic (dh-l@lists.io.com, subscribe on majordomo@lists.io.com), but I have had problems with getting a the reply block correct, most likely due to me being very new to majordomo type lists. Another problem with this way of a data haven is the way network traffic gets transfered around. To have it more anonymous, DC net technology can be used, but this very hard to implement. As of now, I am looking for someone who can help me implement a redundant controller system, so when the DH is contacted, even if the first one if smashed, the "RAID" stays operable. Currently, the data haven program just wakes up on input from a .forward file into its stdin and acts on it.
RE: Your distributed Data Mockup. If this ever gets to the point where it might be implemented on a commercial basis, please let me know. It may mesh well with a project I am working on and may be profitable for you. -- I hate lightning - finger for public key - Vote Monarchist unicorn@schloss.li
-----BEGIN PGP SIGNED MESSAGE----- In article <3212B0CB.1AB7@deltanet.com>, Kevin Stephenson <cts@deltanet.com> wrote:
DigiCash will never work in my book because it requires an account,
Actually, it doesn't... It's just that in order to use DigiCash's ecash without an account, you need a slightly cooler client, which (AFAIK) nobody's gotten around to writing (yet). - Ian "you'll have to wait until after Crypto..." -----BEGIN PGP SIGNATURE----- Version: 2.6.2 iQCVAwUBMhOZvEZRiTErSPb1AQFlZAP7B7jpZguOk0vA30pkgY6W17SHf/F8ik1/ SOWYiYdSzZ9go9BhoMQyyF68EzzUgwtsqlD3RAU31eMIqMrsAKaHDwp8bMHo7wUc FgQZtMniJlPj1oukLegFpueDAEcKhM+HDaYehgeKvf24CSlw3o6vi1li7x4R1GKc 22aco7e6/s4= =W86h -----END PGP SIGNATURE-----
participants (5)
-
Black Unicorn -
Douglas R. Floyd -
iang@cs.berkeley.edu -
Kevin Stephenson -
Stephan Schmidt