Re: Improved remailer reordering
In message <9408062320.AA17234@ah.com> Eric Hughes writes:
About message mixing:
A measure that is used for situations like this is entropy.
Indeed. This is exactly the mathematical measure for what I've called "privacy diffusion" in a remailer network. It is, namely a measure of of the uncertainty to a watcher of what ingoing message corresponds to what outgoing message.
As soon as you begin to write down some of the equations for this value, several things become distinct possibilities:
-- duplicate messages may decrease security -- retries may reduce security -- interactive protocols may reduce security -- there is such a thing as a needlessly lengthy remailer path -- noise messages might not be worth the bother -- multiple different routes may reduce security
One thing becomes blaringly obvious:
-- it's reordering that's mathematically significant; that's what goes directly into the equations.
On thing is glaringly obvious: if you use the wrong assumptions, you will get the wrong answers. Imagine a RemailerNet (v0.2) that maintained a fixed level of traffic between gateways. Messages are injected into the system at various gateways and emerge at various gateways. All traffic between gateways is encrypted. All traffic takes the form of packets of the same length, perhaps 1024 bytes. [It is possible that a much smaller packet size might be desirable, specifically the ATM packet size, with 48 bytes of data payload.] Messages are fragmented according to policies at the entry gateway. Intervening gateways may or may not further fragment incoming packets according to gateway policy. The exit gateway is responsible for reassembling packets into messages. The routing of packets is randomized to some extent. Message transmission is guaranteed to be reliable in the sense that either the message will get there or the sender will be told that it didn't. Users desiring a high level of security are required to participate in the game. They must accept and send a fixed number of packets at each connection. These users should be responsible for packetizing their own messages when sending and assembling their own messages when receiving. They must encrypt all communications with gateways. These 'empowered' users are in fact operating RemailerNet gateways. It is likely that different levels of gateway would have to be defined, depending upon the degree of physical control that the operator had over the gateway and the level of resources that he or she was willing to devote to RemailerNet operations. Entry level users would communicate using ordinary email. Major gateway operators would communicate using RemailerNet protocols over TCP/IP. Time is measured in this system in steps. Each step corresponds to the dispatch of one set of packets. The relationship between 'step time' and chronological time will vary from link to link. This system will tolerate an arbitrary level of traffic. Over time the level of traffic (in bytes/sec) would be some multiple of the average volume (bytes/sec) of messages carried. The gateways would automatically adjust the traffic level. [Probably it should rise quickly and fall gradually.] The functioning of the system as a whole makes it very difficult to do any kind of realistic traffic analysis. Any reordering of messages is performed at the packet level. In general, the messages do not exist as wholes along the lines connecting the gateways, so a discussion of their reordering is a good way to waste time. A detailed mathematical analysis of what makes the system difficult to attack would itself be quite difficult. But I would suggest that the key factors are the fragmenting of messages, the use of fixed length packets, the systematic introduction of noise, and random delays in dispatching packets. [The random delays reorder the packets and they also introduce noise -- an unused timeslot is filled with a noise packet.] If, of course, your equations include only measures of the reordering of messages, your results will depend only upon measures of reordering of messages. -- Jim Dixon [this is not a complete or final description of RemailerNet] [v0.2 but should be sufficient to encourage a few attacks ]
Imagine a RemailerNet (v0.2) that maintained a fixed level of traffic between gateways. This is exactly what I was talking about when I posted earlier about link encryptors, and effective collapse of nodes for traffic analysis purposes. Traffic analysis of mixes and remailers assumes, as an abstraction, that all the messages going into and coming out of a particular node are visible. As soon as you remove this condition, the analytical situation changes completely. And it changes for the better, since the reduction in observed information can only improve security. Message arrival and departure times are not irrelevant, and their removal gives less useful information. The desired net result is a single node for traffic analysis purposes. But even for a single node, estimates of reordering still need to be made. The problem with implementation of link encryption is, like everything else, cost. Link encryption off the Internet requires dedicated lines. Link encryption on the Internet likely won't get you into trouble now, but likely will be an issue as subsidies go away. In general, the messages do not exist as wholes along the lines connecting the gateways, so a discussion of their reordering is a good way to waste time. You still have to worry about reordering in the network as a whole. The system you've described has reassembly done at the endpoints, who might not be the final receiver. I pass over the flaw of lack of message quantization in the final sending of reassembled messages. We may assume for discussion that they're all the same length. Now, you still need to calculate the likelihood that a particular outgoing message is the same message as a particular incoming message. These probabilities have to do with message reordering. You still need to do the calculation. Eric
hughes@ah.com (Eric Hughes) writes, quoting Jim Dixon:
Imagine a RemailerNet (v0.2) that maintained a fixed level of traffic between gateways.
This is exactly what I was talking about when I posted earlier about link encryptors, and effective collapse of nodes for traffic analysis purposes. Traffic analysis of mixes and remailers assumes, as an abstraction, that all the messages going into and coming out of a particular node are visible. As soon as you remove this condition, the analytical situation changes completely.
So, I guess what you are saying is, two remailer nodes connected by a fully-encrypted link which carries dummy traffic so the data rate is constant (and hence effectively invisible) can be thought of as one node for some purposes. So let me ask: how does a network which contains these two nodes compare with one which has only a single node in their place? I can see three models to compare. The first is a single node network. The second is a tightly-coupled two-node network with link encryption so no information is available on the traffic between them. Messages will be sent into and out of this pair of nodes in such a way as to maximize entropy of distribution. The third is a loosely-coupled two-node network where the nodes are used as a Chaum-style cascade, but with half the messages going in each direction. For the first network, if the bandwidth into (and hence out of) the single node is N, we get the maximal possible confusion, as I suggested before. If the total bandwidth into the remailer network is N, then the tightly-coupled two-node network might average N/2 into each of the nodes, with N/2 out of each of them. For maximal confusion, half of the incoming data would be sent over to come out of the other node, so we have N/4 going in each direction on the link. The net result is that the two-node net has each node with a bandwidth of 3/4 N coming in (and going out) to attain the confusion level of an ideal one-node system. This is superior in per-node bandwidth but greater in total network bandwidth. As for security against corrupt operators, this gives some improvement over a one-node system, but not as much as with two independent nodes. In this model, only half the messages go through both nodes, so only half get the benefit of a two-node chain. (Also, as I suggested before, we might question whether two node operators who were able to cooperate and trust each other well enough to set up this kind of link would be truly independent.) For the third model, two nodes connected by an ordinary link and used as two-node chains, each node now has an input bandwidth of N: N/2 from users (who choose each node at random as the first of the chain), and N/2 from the other remailer (where the node is acting as the second of the chain). So we have paid a price in bandwidth, with each node carrying N, and a total net bandwidth of 2N. But we have gained in security against operator malfeasance: all messages now go through both remailers and if either one is hiding the mapping then it is lost. So, there appears to be some tradeoffs between bandwidth savings and security against dishonest operators. It will be interesting to see how these results extend to larger numbers of nodes. Hal
participants (3)
-
Hal -
hughes@ah.com -
jdd@aiki.demon.co.uk