Remailer Phases

Wed Aug 8 15:46:04 PDT 2001

----- Original Message -----
From: "Anonymous" <nobody at paranoici.org>
To: <cypherpunks at lne.com>
Sent: Wednesday, August 08, 2001 4:48 PM
Subject: CDR: Re: Remailer Phases

> An Unknown Party wrote:
> > On Wed, 8 Aug 2001, Anonymous wrote:
> > > We need a good mixmaster net.
> > >
> > > working remailer:
> > >   1. Average latency less than 5 min
> >
> > Bad. See the papers done on threats of traffic analysis/spam attacks
> > against remailers.
>
> "Average latency" exists.  What do you think it should be?
>
> a) 5 minutes
> b) 5 hours
> c) 5 days
> d) 5 months
> e) longer
>
> I like a).

As has been pointed out it's not latency but latency/messages that matters.
If there are 2 messages a day going through the system then 5 minutes is
simply not enough, it will be completely traceable. OTOH if there are
5000/sec going through the system then 5 minutes is arguably overkill. I
think that with the current relatively low level of usage 24 hours is the
minimum average latency that should be used. Of course this is across the
entire Mixmaster net where messages could be dispersed enter at any location
and leave at any location. Based on this I believe that each node should
maintain a list l of known other nodes. It should of course select a delay
time at random say up to t time. Assuming that the server will choose a new
exit point at perfect random from itself (where it will exit immediately on
timer expiration) and l this gives an equation for t in hours f(t) =
necessaryDelay; f(t) = t + ((|l|-1)/|l|)f(t), by finding the solution for t
you will find the necessary average t. I don't have the time to solve this
right now but given a list l of magnitude 100 the value of t will be
significantly greater than 5 minutes.

So the remaining question is what value to use for necessary delay? This is
also dependent on the number of known nodes. All nodes must be equally
likely for the transfer for obvious reasons. Based on this I believe that
necessaryDelay needs to be greater than the time needed to receive |l|
messages. The reason for this is fairly simple, at the extreme we have only
one possible message going through the system at once, this is obviously
bad, an observer simply watches the output of the system, and what comes out
is what they are looking for. with at least |l| messages going through a
node and |l| necessary delay time (note that as the magnitude of l increases
the entire system slows, this could be bad, I'm sure I'm missing something
that will dominate on scaling) each message can be mistaken for other
messages. Since it is expectable that the usage of remailers will increase
at least as fast as the size of l the latency will likely decrease over
time.

If there is sufficient demand it is entirely reasonable to reduce from |l|
to a value of at least 2, but I don't believe this is reasonable at 100 or
even 1000 remailers. If the amount of remailer usage increases to the point
where > 20% of email traffic goes through remailers it may become feasible
to lower this limit, but probably unnecessary because this scaling would
result in lowered delays as a matter of recomputation.

What is surprising is that this can be automatically calculated in a rather
interesting way. If each still maintains l it is entirely possible for a
remailer to create a message pool of size |l| and when a new message arrives
if the pool is full randomly select 1 entry to be flushed towards it's
destination _prior_ to the insertion of the new message, with an autoflush
happening every sqrt(|l|) hours (perhaps by insertion of null message). This
would cause a ripple effect each time a message was sent which could be seen
as a problem by the uninitiated because there would be a decided pattern of
travel with each message entering the system causing activity along a random
walk. To an amateur this would appear to be a flaw in the system, except
that the message being sent by the ith node is not the message sent by the
i-1th node, so the risk is non-existent, and since the average path length
is going to be k=2((|l|-1)/|l|), and the random walk is going to choose from
|l|^k paths, which we can approximate by |l|^2 this offers a sufficient
growth rate to block tracing. If this growth rate is unacceptable we can
also add a minimumHops value to the protocol increasing the number of paths
to |l|^minimumHops + |l|^2, minimumHops should be chosen to be a suitable
number, based on current assumptions I would recommend minimumHops =
logbase|l|(2^128), making the |l|^2 only a footnote as the total would be
greater than 2^128 giving an enormous difficulty in even selecting a
duplicate path.

Mitigating factors are present however, because each message can only exist
in one of |l| locations, so the maximum difficulty in guessing is still
bounded in that fashion, leaving the reasonable values for minimumHops at
around 10 for a 100 node network.
                        Joe