-----BEGIN PGP SIGNED MESSAGE----- [ To: Cypherpunks ## 06/28/96 04:34 pm ## Subject: anonymous mailing lists ]
Date: Fri, 14 Jun 1996 02:15:03 +0000 (GMT) From: Ecafe Mixmaster Remailer <mixmaster@remail.ecafe.org> Subject: Hackerpunks and C2
The proposal for a Hackerpunks nym based mailing list is interesting, however, there are some concerns regarding the susceptibility of the list to traffic analysis.
I was thinking about attacks that can be carried out on remailers in general, and came up with something that is potentially pretty nasty, especially for anonymous mailing lists and people who post a lot of stuff anonymously using ``nyms.'' Let's imagine an anonymous remailer network as a ``black box'' which functions perfectly. Messages (broken into equal-sized packets and strongly encrypted) are sent into the network by the sender, and at some later time, they come out at the receiver. Let's assume there is no possible way for an attacker to trace a message through this network. Now, we still have to deal with two more issues--mail goes into the network, and comes out of the network. At those two points, there is trafic analysis available. Specifically, we can see how much data goes over the line. (Naturally, if it's encrypted, we can't tell how much of it is real data and how much is padding.) Generally, when we're attacking this system, we're trying to figure out either the sender or the receiver of a message (or a sequence of messages), based on what we can observe coming into and going out of the network of remailers. There are basically five scenarios: 1. The sender wants to know who the receiver of his message is. 2. The receiver wants to know who the sender of his message is. 3. An outsider wants to identify the sender of a message. 4. An outsider wants to identify the receiver of a message. 5. One receiver of a message wants to know who the other receivers of the message are. (This is the case for anonymous lists.) Now, there are a couple of different ways these attacks can be carried out. Usually, I've seen people talk about ``tracing'' attacks, in which a message is traced from one side of the network to another, without any clear idea of who might be on the other side. However, I think a more realistic situation is to imagine the attacker trying to test the hypothesis that some person is on the other end of the anonymous transmission. If relatively few people regularly send or receiver anonymous e-mail, then this is practical for many kinds of test. It's even more practical when we're dealing with relatively small populations of interested people in some technical subject. (This is conceptually similar to the ``dictionary attack'' on passphrases.) Basically, what we're looking at, in that case, is some test which (with some reasonably high probability) determines whether some person is the sender or receiver of a given message or stream of messages. This leads to some interesting insights. 1. In reasonably large text messages, it's probably easy to test hypotheses about senders. There are metrics that can more-or-less identify the writer of a piece of prose. While it's no doubt possible to defeat this kind of analysis for some things (i.e., blackmail notes or rigidly-defined messages in a cryptographic protocol), I suspect that this is very hard to defeat for a mailing list where the objective is to discuss serious technical issues. (This kind of analysis also causes headaches for people trying to do strong steganography in text.) 2. If an attacker (i.e., the NSA) logs the total volume of all traffic in and out of the remailer network, and to whom each message came from or went to, then that attacker can probably mount some very powerful hypothesis-testing attacks. It's these attacks I want to discuss. If Alice sends a message to Bob through the remailer network, two things must happen to prevent it from being trivially traceable. 1. The message has to change size. If the message is already encrypted, then compression isn't much of an option--so what's left is padding it out by a random amount. The amount of padding per message is probably a uniformly distributed random variable. 2. The message has to be delayed somewhat. The delay is probably also a uniformly distributed random variable, or possibly the result of adding N such variables, where N is the number of chained remailers. For a single mailing, this is probably not much of a threat. There will be enough ``noise'' in the delay and padding that most transmissions will be masked. However, consider the situation of a mailing-list. Alice and Bob are both recipients of the list. Alice wants to decide whether Bob is receiving the list. Let D be a delay such that, if Alice received her copy at time T, 90% of the other list members received their copy between T-D and T+D. Now, Alice looks at Bob's anonymous e-mail volume during that time span vs. at all other times. If he's receiving the same stuff she is, then there should be an increase within that span of time, on average. The random distribution of the arrival time will mask individual transmissions, but with many messages, it probably will not. (This is conceptually similar to the situation in Paul Kocher's timing attacks--adding some random pauses doesn't hurt the attack as much as most people expect it to, because those random pauses, summed up over many messages, become a normally distributed random variable.) The average amount of anonymous e-mail Bob gets per day doesn't have much effect, nor do occasional worst-cases. The only ways I can see to prevent this attack are either to ensure that Bob gets a constant rate of information from the anonymous remailer network, or to make the arrival time span so large that other randomness in the sample makes the change in volume undetectable. In general, I don't think this second one will work without accepting incredible delays on messages. This can also be adapted to tracing back anonymous posters to newsgroups and mailing lists, when they use a consistent nym. (They could also be traced by textual analysis.) In this case, the attacker starts by posting some anonymous messages (not using a nym--he doesn't need one), to get some statistics on what the average delay is, and also what the average amount of padding is. He may do this for several different ways of putting things together--he's got almost unlimited time to gather acceptable data. At this point, he observes in/out traffic logs for each hypothesized sender during a wide timespan before the arrival of the post at its destination. He compares activity inside that span with activity outside, over a large number of posts. If there is a correlation, then he's got the e-mail address of the nym. There are ways to get around this second attack, at least to some extent. However, I don't think it's wise to count on even very good remailer networks (i.e., the Mixmaster stuff) to protect your anonymity in this situation. (However, note that I'm thinking in terms of a very well funded, determined adversary. It's probably not too bad to count on it to protect your anonymity from technically unsophisticated attackers--but I wouldn't recommend using it for things that (say) the FBI or NSA might get very interested in.) I think the best defense against this will be something like this: Each user sets a quota of how much trafic he will take in and send out per day. Once per day, he goes through an interaction in which he downloads and uploads that much stuff, whether there's any of it for him or not. (Naturally, this won't be detectable from looking at the transmission, timing the interaction, etc.) This makes any volume variations per day disappear. Unfortunately, it also limits the user's total inflow and outflow, which means he'll have to set it to something larger than the maximum he ever expects to get. (It would be possible to have occasional overflow onto the next day's downloads, but not too often, or the user would fall further behind, on average, each day.) The size of these quotas will still leak some information, though not enough for the kinds of attacks I discussed above. Note: Please respond via e-mail as well as or instead of posting, as I get CP-LITE instead of the whole list. --John Kelsey, jmkelsey@delphi.com / kelsey@counterpane.com PGP 2.6 fingerprint = 4FE2 F421 100F BB0A 03D1 FE06 A435 7E36 -----BEGIN PGP SIGNATURE----- Version: 2.6.2 iQCVAwUBMdRWR0Hx57Ag8goBAQHCowQA71WBKkx1yonS0dEpy3pe7lgvSJPkpLUk zLjm0KeFoP+HGQBep48iILRYBlbGy5czcxNCU4zhE6+c4PWwvD+BpaGGccWWkyRi 0l/rdo5L5/1KgnpCAQJ/HNyRH0fO2NNOHvGB3m7I0H3lfmfOlNed8oIIjPFDVB23 60wpMZ9S93w= =HC1g -----END PGP SIGNATURE-----