Remailer chaining results
-----BEGIN PGP SIGNED MESSAGE----- I've done some calculations on the mixing properties of Chaum-style networks and gotten some interesting results. Recall that in a Chaum-type remailer network users use nested encryption and remailing instructions to set up a chain or "cascade" of remailers. Each remailer strips off the encryption envelope and sees the address of the next remailer in the chain or, for the final remailer, the ultimate destination. All messages are the same size and carry no distinguishing features. We assume that the opponent is monitoring all messages traffic into and out of all remailers on the net but can't see what is happening within each remailer. Let's take a concrete example and suppose there are four remailers. Everyone sets up a chain of 2 remailers, chosen at random from these four. A batch of messages is received by each remailer, which strips off the envelope and sends them on to the next remailer in the chain, where they are mixed with the other messages which chose that remailer as the 2nd in the chain, then sent out to their ultimate destinations. This model is a little artificial in that we are assuming a certain amount of synchrony of the operation of the various remailers for simplicity. (Note that for this four-node network there are twelve possible two-node chains where the nodes are different.) There are three measures that I am interested in: bandwidth used (the less the better); message mixing (the more the better); and immunity to subversion (the more the better). For bandwidth we can measure the flow through the remailer. Due to the symmetry of the situation, the inflow and outflow are equal and the same for all remailers. Message flows per remailer are the sum of the flow into the remailer from outside (the user messages), plus all flows into the remailer from the other remailers. Mixing can be measured by a probability distribution over the outgoing messages which represents how likely they are to be a given incoming message. For simplicity this can be expressed simply as the number M of messages which are equally likely to be the original (in an earlier message I used entropy which is a log measure of the same thing). I am thinking of measuring immunity to subversion in terms of how much mixing is lost by a certain number of "failed" (that is, subverted) nodes. Some networks are vulnerable to "single point failures", where a single subverted node destroys all the anonymity. A more robust network would require multiple failures for this to happen. However, it turns out that even in a multiple-failure network a single-point failure may reveal some information about the messages, which we can express as a loss in mixing. Let the total message bandwidth into the network be N packets per time unit. Due to symmetry, each node will receive N/4 packets. With the chains as defined above, the other three nodes will all be equally likely to be the 2nd in the chain, so N/12 packets are sent to each of them. Simultaneously, N/12 packets come to this node from each of the others. This is a total internode bandwidth of N/4 in each direction per node, or N total per direction. Add this internode bandwidth to the user-link bandwidth of N per direction and we get 2N total, or N/2 per node. At the beginning of each chain, we have N/4 packets come in and get mixed as each node. As the packets go out, they are sent to the other three remailers, and when they leave they may be any of the output of those three. Thus they are equally likely to be any of 3N/4 of the packets, and this is the amount of mixing we have. If one of the two nodes in your remailer chain is compromised, it provides no effective mixing. This means that your message is only mixed at one node, where it is combined as part of a batch of N/4, so that is the degree of mixing you have with a single failure. If both remailers are compromised then of course you have no mixing, which we would write as a factor of 1 in uncertainty increase. This can also be expressed in terms of a percentage compromise of the network. If 1 node is compromised, which can be represented as p=.25, then the six of the twelve remailer paths which use that node will have single-point failures with the comcomitant reduction in mixing. In other words, half of the messages will have the full 3N/4 mixing while half have N/4. With p=.50, two nodes are compromised. Two paths are safe, eight have single failures, and two have double-failures. So we have 1/6 of the messages with 3N/4, 2/3 with N/4, and 1/6 with only 1 mixing. With p=.75, three nodes compromised, there are no safe paths; half have single failures and half have double. So 1/2 the messages have mixing of N/4 and half have 1. And of course with p=1 all messages are compromised with mixing factor 1. Let me just go on and extend this analysis in one way. In the discussion of the chains, we have assumed that the two nodes in the chain would be different. Logically though one could have chains where both nodes were the same. Let us compare this network with the one we just did. There are now 16 possible chains. Total bandwidth is somewhat less (since we don't count the messages which stay in one remailer). Now only 3/4 of the messages from each node need to get exchanged. Per node, there will be N/4 messages to users and 3N/16 messages to other nodes, for a total of 7N/16 per node or 7N/4 total (above the 7's were 8's). Mixing is actually improved; there is no limitation on which input messages might map to which output ones, so we have full N-fold mixing (compared to 3N/4 above). With single-point failure mixing is again N/4 as above. The failure behavior is quite different. With p=.25, 1 of the 16 paths is totally compromised, 6 of the 16 have single failures for N/4 mixing, and 9 of the 16 have no failures for N mixing. With p=.50, 4/16 of the paths have mixing 1, 8/16 have mixing N/4, and 4/16 have mixing N. With p=.75, 9/16 have mixing 1, 6/16 have N/4, and 1/16 have N. It's not clear what measure is useful to compare these failure situations. A double-point failure seems much worse than a single one. I wonder whether taking a geometric mean (which would be equivalent to taking the arithmetic mean of the entropies) would be valid. If we did that for the p=.25 case, we get average mixing of .59N^(15/16) for the self-chain network, and .27N for the network where all chains are two different nodes. For N less than about 250,000 packets per (network-wide) batch the self-chain network provides superior average mixing in the p=.25 case by this measure. Sparing the math, for p=.50 the self-chain network is superior for batch sizes smaller than 29 packets, and for p=.75 the self-chain network is only superior for batch sizes smaller than 16 packets. This suggests that if the network is likely to be mostly safe then the extra mixing allowed by same-node chains is worth the small increased risk of exposure. But as the chance of encountering bad nodes rises it becomes unwise to take this chance. Here is a quick summary of the extension of these results to larger numbers of remailers and longer chains. Let there be R remailers and let the chain length be K. Let the number of message packets per batch (network wide) again be N. (I will neglect the differences between same-node chains and different-node chains as they are generally small effects on the order of 1/R.) Bandwidth per node is approximately KN/R. Network wide it is therefore KN. Adding remailer hops increases network bandwidth loads directly in proportion to the number of hops. Mixing is approximately N for K=2 and up, which is the maximum possible. For K=1 mixing is N/R. Fault tolerance is interesting. A K-length cascade is invulnerable to up to K-2 failures! At K-1 the mixing decreases from N to N/R, a significant decrease. And with K failures of course the mixing drops to 1. I was surprised how robust these networks are. The reason is that with even K-2 compromised remailers in a K-length cascade there still remains a safe length 2 cascade, and as we saw above that provides N-fold mixing. This provides some guidelines on the choice of K. First, K should clearly be at least 2. The increase from K=1 to K=2 increases mixing from N/R to N, a considerable increase. Secondly, K should probably be at least 3. This will provide full mixing even if you are unlucky enough to choose a compromised remailer. Beyond this, you can calculate that with a chain length of K and probability p of a compromised node, the expected number of compromised nodes in your chain is Kp. This suggests that you should choose K large enough that Kp is well below K-2. If you estimate p=.50, for example, you might choose K=8. The binomial theorem states that the chance of x failures out of k nodes where the probability of each failure is p is (p^x)*((1-p)^(k-x))*k!/x!(k-x)!. In this example, the chance of 7 failures out of 8 is about 3% and the chances of 8/8 is about .5% for a total risk of 3.5% that you won't be fully protected. Now, how many people read this far? ;-) Hal -----BEGIN PGP SIGNATURE----- Version: 2.1e (yikes, where'd I find this old version!) iQCVAgUBLkbYB6gTA69YIUw3AQHligP+PBRC1pmZ6+T10WCQ91SZ2GdYX4/iEsKQ eMfCLlQ0PFbPEWZ5TaDwbOLCCUSBAbb6OO3Y2U8SHF/zZKJLrHI09/Ssl/ZeQ3st 9G9JrncU9Wo7Z9N1zMPJuQy21qFKNOkAwVQHxThObMSxQWh+TWem8lDKzm6ea0VH sejMQG+nVyo= =BWsP -----END PGP SIGNATURE-----
participants (1)
-
nobody@shell.portal.com