
----- Original Message ----- From: "Jim Choate" <ravage@ssz.com> To: <cypherpunks@einstein.ssz.com> Sent: Wednesday, August 08, 2001 7:05 PM Subject: CDR: Re: Mixmaster Message Drops
The next major question is to determine where the drops are happening. Inbound, outbound, inter-remailer, intra-remailer?
That matters from a correction view but not from a usage view, which I assume we're taking. Basically we don't care what technology the remailer uses as long as it is correct technology and trustable. From there we care only what remailers are disfunctional and which are useful.
One aspect of this, assuming the remailers are under attack and that is the hypothesis we are going to assume, is that we need to be able to inject traffic into the remailer stream anonymously. Otherwise Mallet get's wise to what is going on and starts playing us.
Well assuming that the remailers are under attack, we start using digital signatures with initiation information stored in them. Mallet can introduce duplicates, but the likelihood of a duplicate being detected rises very quickly, (i.e at a rate of 1-(1/20)^M for M duplicate messages assuming a drop rate of 1 in 20). This gives us the ability to discount the vast majority of what Mallet does and get very close to accurate values. The bigger risk is for Mallet to identify our queries and force the proper functioning of the node exclusively for the query. Correcting this is much more difficult, but would only take the use of digital signatures and encryption on all the messages traversing the network. Since the remailer user inherently a more developed user than Joe (l)User this is much more reasonable. But still approaches impossible because the remailer users is a finite set so Mallet could store all the remailer user keys, and treat them differently from the query keys. This becomes extremely difficult as long term keys are defeated as well as ephemeral keys. Instead the remailer users will have to maintain statistics, or at least a large unknown portion of them. If users upload to say freenet once a month the number of anonymous messages they have sent and recieved (without mention of timeframe except implicitly month) we could get an overall droprate, and the users wouldn't have to reveal who they are.
If at all possible all measurements should be made anonymously and as stealthily as possible.
Agreed I was beginning to adress this above, it still has some major problems.
Q: How to inject traffic into the remailer network anonymously?
through a set of trusted remailers, if those remailers are trusted and are used for test initiation, then the exact droprate from that entry point will be known. This will build a reputation for those remailers making it desirable for trustable remailer operators to be in that set by increasing the number of messages, leading to better security by initiating from the trusted list.
Q: How do we measure the input/output flow without collusion of the operator?
You count the messages in and the messages out, you don't care what they say, where they're from etc, the operator doesn'tr even need to know you're doing it. Of course this is a rather difficult task, the better option would be to test the network as a whole, by colluding of users to collect statistics on their own messages going through, this would defeat much of what Mallet could do because the test messages would be real messages that are being propogated through.
Q: Where are the computing resources to munge resulting flood of data over at least a few weeks time period. How do we hide this 'extra' flow of data? It represents an opportunity for incidental monitoring due to load usage.
Wouldn't be that bad. Treating the network as a function of it's entry-point seems easiest. Then it's just a simple fraction which can be published raw or you can waste 4 seconds on a 1GHz machine and compute the values. Either way it's not compute intensive, most of the work needs to be done by legitimate users with legitimate messages (to prevent Mallet from playing with the messages).
Q: How do we munge the data? What are we trying to 'fit'?
We are trying to determine the best entry-point for anonymous remailer use as measured by percentage of messages that reach their destination, as filtered by being "trusted".
Q: Once we have the data and can (dis)prove the hypothesis, then what?
Then we only trust the servers on the "trusted" list, and we use the best remailer from the list in terms of delivery. This will encourage individuals that run worthless remailers to improve their systems, eventually leading to the dropping of only a handful of messages a year. Joe