Jurisdictionless Distributed Data Havens

17 Dec 2003

      With all of the debate on offshore data havens, I've been thinking of a
way to implement a distributed data haven that would not be subject to
(hopefully) any local jurisdiction and therefore, would not need to be
located in the place of cypherpunks wet dreams ;). If this idea has been
mentioned by someone else, I am not aware of it. By distributing the
data over several servers (using RAID like striping), compromise of a
single or multiple servers (depending on implementation specifics) would
not cause a collapse of the data haven.

Here is a way such a data haven could be setup:

The data is split into multiple parts. Each site is responsible for
maintaining only its part. For a client to access a piece of data, he
will have to contact a certain number of sites to reconstruct the data.
The client could also use some sort of anonymizer service to do the data
collection and then do then combining on his local machine.

If one site gets shut down by Big Brother, denial of service attack,
etc., the other sites either find a new site to replicate the missing
part to, or they need to reconstruct the data and re-stripe it for the
remaining servers. (Read any networking book with a section on RAID to
see what all this means.)

Another idea is that the data can be encrypted, and the client pays the
key-holder for the key. The key holder would preferably be the content
provider and not one of the servers. i.e.: Client finds Bombs R Us and
wants to buy pipe bomb instructions. Bombs R Us gets anonymous payment
from client for the instructions. Bombs R Us says to client: "collect"
page 5 "parts" and use this key to decrypt. In this scheme, using
anonymous digital cash, Bombs R Us can remain anonymous with his data
publicly available but encrypted. He can pay each of the server
maintainers in anonymous digital cash as his expense. Each server cannot
(or should not) be held responsible for disseminating bomb making
instructions because each server does not *have* the instructions in a
complete form (encrypted or otherwise). It would be like someone calling
the cops and saying, I placed a box with XXXXX in it at the airport.
This in itself is not a threat. XXXXX could be anything, including OJ
Simpson's bloody clothes. And if another guy called in and said "pipe
bomb" and hung up, this is meaningless also.

The reason data striping is better than a simple mirroring network is
that no single site contains anything useful in itself for the
authorities to use against the server maintainer. (Similar to a remailer
network perhaps)

An extra feature could be if some major attack was initiated against the
data haven, there could be a dead-man button of some sort to make the
data vanish altogether by sending distress signals to the other servers
(or to at least one server, which could then cascade the signal).

The system could either use RAID Level 5 data striping, or some hybrid
scheme like this:

Site 1 has bits 1 and 5 of data
Site 2 has bits 2 and 6 of data
Site 3 has bits 3 and 7 of data
Site 4 has bits 4 and 8 of data
Site 5 has 2 parity bits (1 parity bit per nibble)
Site 6 has bits 1, 2, 3, and 4
Site 7 has bits 5, 6, 7, and 8
etc.

In this particular scheme, it would take the downing of at least 3 of
the 7 servers to prevent data collection. It would also take at least 2
sites worth of data to reconstruct the data into a usable form. As much
or as little redundancy as needed can be built into the system.

The only time the data is in it's whole is prior to being stripped
across the servers, during data recreation and re-striping, and when in
the hands of the client.

Benefits:

* No single site contains any incriminating evidence

* Allows for a true "virtual company" to exist, with just a mail-box to
receive it's anonymous digital cash payments

Problems:

* Requires a re-striping of data when the data source is changed

* Requires a data regeneration and re-striping when a site goes down and
a replacement site can not be found (could possibly use an idle standby
server to circumvent this problem)

Attacks:

* Government sabotages site 1, then watches site 2 for data regeneration
before re-striping (Can be thwarted by having data regeneration
happening on a randomly picked server)

Clarifications:

When I say digital cash, I am not referring to Chaum's DigiCash.
DigiCash will never work in my book because it requires an account,
amongst other reasons. YMMV.

Comments welcome.

Kevin Stephenson

-Silence is Security
	WWIIish Poster

Kevin Stephenson

Stephan Schmidt

Douglas R. Floyd

Douglas R. Floyd

Black Unicorn

iang＠cs.berkeley.edu

tags

participants (5)