Remailers: The Next Generation

Timothy C. May tcmay at netcom.com
Fri Jan 21 13:08:14 PST 1994


Cypherpunks, 

Here's a long article on some issues surrounding the "next generation" of
remailers, hopefully a closer approximation to the digital mix. I hope you
folks will add ideas, comment on this article, critique it, etc.

This article includes:

* discussion of the need for a second generation of remailers

* ten basic features needed to better approximate the ideal digital mix

* material on reputations and market systems that cryptologists ignore (the
blend of economics and crypto is a fertile hybrid, one that solves for
practical uses many of the problems as-yet-unsolved with pure cryptography)

* suggestions for a series of agreements needed on message formats, digital
postage (or some variant), and other things to make a second generation
ecology of remailers more useful

INTRODUCTION

The recent experiences with Detweiler beginning to use Cypherpunks
remailers (what took him so long?) points out some weaknesses of the
current overall architecture which we've known about for a long time. We
always knew the first generation of remailers, operational since circa
November 1992, was far from optimal. Traffic analysis would be relatively
trivial for any motivated agency with access to Internet traffic to do
(e.g., most messages flow into a site and then out immediately, and also
have characteristic packet sizes), and the remailers are far from meeting
even the basic standards laid out in David Chaum's 1981 paper on digital
mixes. 

I suspect most users don't even do any encryption at all, let alone nested
encryption, so the origin-destination information is trivially recoverable.
How to change this for the better depends on a number of things: faster and
easier to use PGP, scripts which can take the various remailers and
generate valid paths through the labyrinth of sites, and cultural factors.

Also, the existing remailers are sensitive to abuse, both in "flooding"
sites and mailing lists with junk mail, and in death threats, harassment,
etc. Stopgap measures, such as excluding Detweiler as an origination
address (for the first chain in a remailer, or later, if he failed to use
encryption), are obviously not a robust solution. Flooding is best solved
with some form of "user pays" type of payment system, which we call
"digital postage"; this could use a basic form of prepaid "digital postage
stamps" (e.g., 20-digit numbers) which are bought in "rolls" (I'll mention
some ideas later) and used _once_. (Yes, this scheme is weak, but it's more
than we have now, and it may be useful anyway.)

The first generation remailers were a fantastic experiment, and became
operational very quickly through the Perl-hacking efforts of Eric Hughes
and others. The enhancements added by Hal Finney, Eric Hollander, Matt
Thomlinson, Miron Cuperman, Karl Barrus, and others (sorry if I left some
names out, or miscredited these folks with having added functionality!)
were impressive. But the basic architecture, the "ecology of remailers" is
showing some serious faults and limitations.

Detweiler's attacks and threats to attack are actually fairly mild compared
to what is possible and what may be coming soon. We shouldn't be wailing
about "abuse" of our remailers when the basic architecture and current
features are so lacking. We may succeed in getting Detweiler blocked at
Colorado State--not that I am advocating this--or in doing some basic
source-screening, but this is not a robust solution. Consider this a
wake-up call. Actually, I'm flabbergasted that it's taken so long....I
expected the first generation system to "break" a long time ago.

It is probably time to seriously think about a "second generation
remailer," incorporating the various ideas discussed in the past 15 months
on this list. 


FEATURES NEEDED IN A SECOND GENERATION REMAILER:


I. DIGITAL POSTAGE, so that the user pays for his use. (This reduces
"flooding" and provides a profit motive for "Mom and Pop" remailers, to
make remailers more ubiquitous. More on this later. Late note: This article
ended up way too long, so I'll defer the discussion of digital postage to
another time.)

II. JUNK MAIL SCREENING. Support for "Don't send anonymous mail to me"
registries, with a database maintained (for a fee?) of sites that wish no
anonymous mail. (I'm not at all sure how best to do this...)

III. IDEAL DIGITAL MIX. A closer approximation to the "ideal digital mix"
(a la Chaum'S 1981 paper and the various later DC-Net embellishments) is
needed. This is a _huge_ discussion area, one we have touched upon several
times. In particular, Hal Finney wrote up a nice summary of the issue about
half a year ago, maybe longer; he may want to repost his summary if this
thread generates any interest.

What follows is my own far from complete summary of some key features:

- variable message latency, L, set either as policy by remailer site ("this
site sets latency = L = 20 messages") or by the message itself (i.e., user
sets, and perhaps pays for, a latency of his own choosing, such as "wait
for 60 messages before sending")

(Note: I strongly favor letting the _user_ pick the latency time, when
possible, not having it "hardwired" into the site itself. Several reasons
for this: doesn't commit the site to a particular latency, allows more
diversity, lets user pay for more latency, etc.)

- quantized message lengths, to defeat traffic analysis based on watching
packet sizes. We've talked about quantizing message lengths as "short" (2K
= 1 screen full of text), "medium" (10K = a 5-screenful typical article),
"long" (30K), and so forth. How many levels of quantization affects the
overall security of the system, of course. Too few levels unnecessarily
pads shorter messages out to longer lengths, too many levels makes traffic
analysis easier, all other things being equal.

Digression on Diffusivity of Remailers: A careful analyis of "diffusivity"
in remailers--roughly, how many possible paths a message may have taken--in
terms of number of remailer hops, latency at each hop, and packet size
needs to be done. As a very simple example, suppose there are 30
operational remailer sites, all with roughly the same functionality (not
what we have now!). A message entering the "labyrinth" (my name for the web
of remailers) may go to any of these 30 remailers, wait until, say 20
messages of the same length have accumulated (a situation very from the
current situation, where low volumes and demands for speedy response mean
there's almost *zero* latency), and then be sent to any of the remaining
remailers (or even itself, in a tricky move of simply not sending the
message). After N such remailings amongst M remailers with a latency of L
messages, a rough measure of the diffusivity is:

D = diffusivity = number of paths the original message may have taken

  = L ^ N  (i.e., the diffusivity rises exponentially with the number of hops)

(This is a simplistic equation, which does not take into account the
practical limitations of there being only so many total messages flowing in
the system, a point addressed briefly below. If only 10 messages "enter the
system" and 10 messages "leave the system," the attacker has an easier
problem than than a D = 3125, for example, might otherwise suggest.)

M = number of remailers is not critically important when M is fairly large.
For example, if M = 1, the solution is trivial. If M = 5, and N > M, this
means the same remailers were used multiple times (recirculating), and the
diffusivity is still quite high. If M is very large, with N < M, the
situation is even better and we can ignore M. In the limit, M will tend
toward infinity (we hope).

Example situations:

1. Current Cypherpunks remailer situation: L = 1 (most remailers are not
"batching" messages, so L =1), N = a few hops, if even that.

Thus, D = 1, which means the path through the labyrinth is trivial to find
for anyone with access to packet traffic.

(I'm also ignoring for the moment the _logging_ of remailer traffic, a real
no-no in terms of Chaum's ideal mix, which originally called for
hardware-based mixes which kept no records, and more recently called for
DC-Nets which _could not_ determine sender. A Chaumian mix which meets his
1981 standards is beyond the "second generation remailer" I'm describing
here.)

2. Better use of existing remailers: L = 5, N = 5, dozens of total messages
flowing

Thus, D = 5 ^ 5 = 3125, meaning that a traffic analyst sees 3125 paths to
follow for every original message, crudely. (In practice, the calculation
above is not accurate unless enough total messages are used. In this
example, there are not likely to be thousands of messages flowing, so the
numbers are reduced. These corrections to the equation need to be made....I
haven't done a combinatorial analyis--perhaps its about time I did.)

This level of diffusifity could be gotten _today_ be using the remailers in
this way:

- pad messages out to quantized sizes (as we have discussed, and some
technical issues of multiple PGP rounds exist)

- set minimum latency to L = 5, for any given quantized size

- send messages through N = 5 hops

- D = L ^ N = 5 ^ 5 = 3125

(That few folks will do this, including me, is a _cultural_ and
_educational_ problem unto itself. Topic for another article.)

3. Future use of existing remailers: L = 10, N = 5

Thus, the naive estimate of D is L ^ N = 10 ^ 5 = 100,000. 

Of course there are not this many paths to follow, but the goal has been
achieved of _effectively obscuring_ the origin-destination mapping.

Note to Readers: I may be losing some readers here by doing these crude
calculations and making related points, so I will return instead to the
listing of features to consider. (Too bad the Net and the various computers
used can't support a collapsible outline structure!)

End of digression.

Back to the list of features:

IV. NO LOGGING. No logging of in-out traffic should be done. I realize that
many operators wish to do this to debug their remailers and to be able to
deal with abusive messages. But make no mistake about it: This is a serious
flaw!

The sooner we can move away from such logging, the better. And sites which
log should tell users, sites which don't log should as well. (Sites which
log but say they _don't_ is of course the real issue in the long
run....I'll save this interesting topic for another article, maybe. Just be
aware that this kind of "collusion" (not exactly, but this is what the
literature calls related behaviors) is not easily solved with existing
remailers.)

V. HARDWARE-BASED REMAILERS. Remailers which are essentially "hardwired" to
behave in a particular way are the next step to take. Since not many people
want to dedicate a machine on the Net to this, this may take a while. Note
that this might still be possible locally as a cheap machine attached to an
existing machine, via a local network. (Terse scenario: Machine on net gets
incoming mail, passes it to cheap 386 box which runs store-and-forward
remailer functions in simple, semi-hardwired way. Perhaps using remailer
code sold on ROMs (a long-range fantasy, I know) and "authenticated" by
"remailer credentialling" private agencies. Mixed messages then get handed
back to machine on the Net, which sends them out.

VI. MARKETS. And advertising, reputations, etc.

Various remailers will have varying features:

- latency L (though I think users should be able to request the latencies
they think they need) and any other "pseudo-latencies" added (e.g., a site
may send out packets to other machines and back to _itself_, even if not
requested by the packet itself, as a way to increase inter-site traffic and
add latency...I dub this "pseudo-latency").

- packet quantizations supported

- digital postage fee (ideally, price competition will occur)

- types of encryption supported, etc.

- sources that are blocked (e.g., Detweiler's site) or destinations that
are blocked (e.g., president at whitehouse.gov). (Thus leading to the flaw in
source-filtering I noted at the beginning: all Detweiler, for example, has
to do is find a remailer site that does _not_ block him, and he's off and
running.)

- policies on reported abuse, logging of traffic, etc.

- any other relevant information. 

How users can keep track of this variable information and then make a
selection of which remailers to use is a central issue. Full use of a
remailer system will almost certainly require scripts and automation at the
user site, scripts which select a path through the labyrinth of remailers
based on desired security, cost, and acceptable time delays, and perhaps
other things as well.

I suggest a second generation remailer use an agreed-upon standard format
for summarizing this kind of information, requestable by users or
credentially agents by sending a message like "::policy" to the site. This
would return a summary of digital postage fees, latencies, packet sizes
supported, PGP parameters, and any other special items. If done according
to a reasonable standard, then scripts could be written to automate this
pinging process and the automatic generation of routes. (Joe User would
decide how much security he wants for what price, would ping the remailers
at some reasonable intervals, and a program would select a set of
remailers, do the envelope-within-envelope preparation, adding postage in
each envelope as needed, and ask Joe User if the plan looks OK to
him...also allowing him to manually (ugh! many dangers of goofs!) add or
delete nodes.

VII. STANDARD FORMATS. The item above points to the need for a standard
format, to be decided upon, for all of the features mentioned here. Where
in the message body (or headers, though I favor message body, for reasons
of encrypted packets within encrypted packets....) is the digital postage
to be included? (This could vary from remailer to remailer, but a standard
would make things simpler. Anyone deviating from the standard would be free
to do so, of course, but this would make scripts to generate paths tend to
avoid his site...a market solution.)

I won't speculate as to what form this should take. Perhaps we need to have
a "working group" on the Cypherpunks list, made up of the real workers out
there. Even a physical meeting that as many folks as possble can attend.


VIII. RATINGS AGENCIES. Independent agents that report on which remailers
are "up," which are experiencing delays and problems, what the policies
are, and what the experiences have been are.

This is part of an ecology or economy of mixes and could also use some form
of digital money, or digital postage stamps to pay for these reports. These
"reputation servers" would give us several useful functions:

1. More of a market, as in VI (MARKETS).

2. Faster feedback, as remailers see problems reported quickly. Users can
see a snapshot of which remailers are up, which are not. (If a reasonable
standard for the report is established, users can plug into this report for
routing messages. In fact, the various ratings agencies--initially I'd only
expect one or two to appear, if that--could also sell scripts/programs
which work with their report formats.)

3. Another prototype use of some simple form of digital money.

4. Incentives for better performance, security, and standardization on a
message format.

5. Performs both a lubrication and a glue function (how's that for mixing
two opposite ideas?) of publicizing information. Increases liquidity,
decreases transaction costs, making the remailers easier and more reliable
to use.

The work by some on "black pages" (crypto equivalent of "yellow pages") is
a step in this direction. The "key servers" which have PGP keys could be
paralleled by "remailer servers" which summarize remailer information, ping
results, user feedback, etc.


IX. DIVERSE SITES. We need more sites which are outside the U.S., more
which are independently owned (i.e., not running on a university or
commercial service provider), and more which are otherwise "untouchable"
and not subject to pressure.

(Aside: I also think we also need "virtual sites" which are themselves only
accessible by remailers. For exmaple, a node called "TIM," running on my
Netcom account, might actually link in a path known only to _me_, to a site
elsewhere. Users would mail to "TIM," but the messages would flow
transparently to some other site, perhaps still located in the U.S.,
perhaps not. From an abstract point of view, this is no different than the
"pseudo-latencies" I mentioned earlier, and can be viewed as just a bunch
of extra hops in the chain of "first class object nodes," but in my opinion
it alters the flavor slightly and makes any publically visible site, like
"TIM," more resistant to attack and shut-down, or at least to seizure of
the actual mix itself. Other names for these sites might be "sacrificial
sites" or "digital cutouts" (a cutout in spy lingo is a person who relays
information, an expendable link).) 
 

X. ATTEMPTS TO BREAK REMAILERS. Just as cryptography is incomplete without
cryptanalyis, so mixes are incomplete without serious attempts to crack
them, to spoof them, to subvert them. This breaking does not have to be of
the "public disaster" sort, that is, we don't have to "squish" a site by
successfully getting a threatening message sent to Janet Reno! Rather, a
"tiger team" approach where the breakages are useful to the operators.

(The ratings agencies would likely play a role here, reporting on their own
experiences, the experiences of their customers, and the results of, say,
independent "tiger teams" sent in to try to break the systems.)

There are obviously things few of us can hope to do: the NSA may have
extensive Internet packet monitoring facilities (a speculation) that we
cannot hope to have, or to spend time to develop. Ditto (squared) for
covert monitoring of Van Eck emissions (breaking systems by monitoring
local computer emissions). Brute force attacks on ciphers. And so on. So
let's not kid ourselves that we can break the systems in all the ways the
real world will try.


CLOSING COMMENTS: 

Well, these are some basic ideas. A tall order to incorporate these into a
second generation set of remailers. But necessary if remailers are to take
off and thrive. The addition of the profit motive, by charging for
remailing in some way, I view as particularly important in incentivizing
progress and proliferation, as well as in in reducing "tragedy of the
commons" types of remailer abuses.

As this message is already so long, I won't elaborate here, as I promised
earlier, on how simple digital postage could be deployed. The idea is the
one we've discussed before: sell 20-digit numbers for perhaps 20 cents
apiece, in "rolls" of 100 or so. The numbers would ber spendable _once_,
perhaps only at the site which issued them (more like a gift certificate).
There are obvious weaknesses in such a system, but it may be usable for
relatively cheap transactions like remailers. I'll leave it to readers to
think about the issues and will perhaps address them in another article,
after I've recovered from writing this one!

I think the first generation of Cypherpunks remailers has been a wonderful
learning experience, but it's time to start planning the next generation.


--Tim May


--
Timothy C. May         | Crypto Anarchy: encryption, digital money,  
tcmay at netcom.com       | anonymous networks, digital pseudonyms, zero
408-688-5409           | knowledge, reputations, information markets, 
W.A.S.T.E.: Aptos, CA  | black markets, collapse of governments.
Higher Power: 2^756839 | Public Key: by arrangement
Note: I put time and money into writing this posting. I hope you enjoy it.      










More information about the cypherpunks-legacy mailing list