anonymous mailing lists
-----BEGIN PGP SIGNED MESSAGE----- [ To: Cypherpunks ## 06/28/96 04:34 pm ## Subject: anonymous mailing lists ]
Date: Fri, 14 Jun 1996 02:15:03 +0000 (GMT) From: Ecafe Mixmaster Remailer <mixmaster@remail.ecafe.org> Subject: Hackerpunks and C2
The proposal for a Hackerpunks nym based mailing list is interesting, however, there are some concerns regarding the susceptibility of the list to traffic analysis.
I was thinking about attacks that can be carried out on remailers in general, and came up with something that is potentially pretty nasty, especially for anonymous mailing lists and people who post a lot of stuff anonymously using ``nyms.'' Let's imagine an anonymous remailer network as a ``black box'' which functions perfectly. Messages (broken into equal-sized packets and strongly encrypted) are sent into the network by the sender, and at some later time, they come out at the receiver. Let's assume there is no possible way for an attacker to trace a message through this network. Now, we still have to deal with two more issues--mail goes into the network, and comes out of the network. At those two points, there is trafic analysis available. Specifically, we can see how much data goes over the line. (Naturally, if it's encrypted, we can't tell how much of it is real data and how much is padding.) Generally, when we're attacking this system, we're trying to figure out either the sender or the receiver of a message (or a sequence of messages), based on what we can observe coming into and going out of the network of remailers. There are basically five scenarios: 1. The sender wants to know who the receiver of his message is. 2. The receiver wants to know who the sender of his message is. 3. An outsider wants to identify the sender of a message. 4. An outsider wants to identify the receiver of a message. 5. One receiver of a message wants to know who the other receivers of the message are. (This is the case for anonymous lists.) Now, there are a couple of different ways these attacks can be carried out. Usually, I've seen people talk about ``tracing'' attacks, in which a message is traced from one side of the network to another, without any clear idea of who might be on the other side. However, I think a more realistic situation is to imagine the attacker trying to test the hypothesis that some person is on the other end of the anonymous transmission. If relatively few people regularly send or receiver anonymous e-mail, then this is practical for many kinds of test. It's even more practical when we're dealing with relatively small populations of interested people in some technical subject. (This is conceptually similar to the ``dictionary attack'' on passphrases.) Basically, what we're looking at, in that case, is some test which (with some reasonably high probability) determines whether some person is the sender or receiver of a given message or stream of messages. This leads to some interesting insights. 1. In reasonably large text messages, it's probably easy to test hypotheses about senders. There are metrics that can more-or-less identify the writer of a piece of prose. While it's no doubt possible to defeat this kind of analysis for some things (i.e., blackmail notes or rigidly-defined messages in a cryptographic protocol), I suspect that this is very hard to defeat for a mailing list where the objective is to discuss serious technical issues. (This kind of analysis also causes headaches for people trying to do strong steganography in text.) 2. If an attacker (i.e., the NSA) logs the total volume of all traffic in and out of the remailer network, and to whom each message came from or went to, then that attacker can probably mount some very powerful hypothesis-testing attacks. It's these attacks I want to discuss. If Alice sends a message to Bob through the remailer network, two things must happen to prevent it from being trivially traceable. 1. The message has to change size. If the message is already encrypted, then compression isn't much of an option--so what's left is padding it out by a random amount. The amount of padding per message is probably a uniformly distributed random variable. 2. The message has to be delayed somewhat. The delay is probably also a uniformly distributed random variable, or possibly the result of adding N such variables, where N is the number of chained remailers. For a single mailing, this is probably not much of a threat. There will be enough ``noise'' in the delay and padding that most transmissions will be masked. However, consider the situation of a mailing-list. Alice and Bob are both recipients of the list. Alice wants to decide whether Bob is receiving the list. Let D be a delay such that, if Alice received her copy at time T, 90% of the other list members received their copy between T-D and T+D. Now, Alice looks at Bob's anonymous e-mail volume during that time span vs. at all other times. If he's receiving the same stuff she is, then there should be an increase within that span of time, on average. The random distribution of the arrival time will mask individual transmissions, but with many messages, it probably will not. (This is conceptually similar to the situation in Paul Kocher's timing attacks--adding some random pauses doesn't hurt the attack as much as most people expect it to, because those random pauses, summed up over many messages, become a normally distributed random variable.) The average amount of anonymous e-mail Bob gets per day doesn't have much effect, nor do occasional worst-cases. The only ways I can see to prevent this attack are either to ensure that Bob gets a constant rate of information from the anonymous remailer network, or to make the arrival time span so large that other randomness in the sample makes the change in volume undetectable. In general, I don't think this second one will work without accepting incredible delays on messages. This can also be adapted to tracing back anonymous posters to newsgroups and mailing lists, when they use a consistent nym. (They could also be traced by textual analysis.) In this case, the attacker starts by posting some anonymous messages (not using a nym--he doesn't need one), to get some statistics on what the average delay is, and also what the average amount of padding is. He may do this for several different ways of putting things together--he's got almost unlimited time to gather acceptable data. At this point, he observes in/out traffic logs for each hypothesized sender during a wide timespan before the arrival of the post at its destination. He compares activity inside that span with activity outside, over a large number of posts. If there is a correlation, then he's got the e-mail address of the nym. There are ways to get around this second attack, at least to some extent. However, I don't think it's wise to count on even very good remailer networks (i.e., the Mixmaster stuff) to protect your anonymity in this situation. (However, note that I'm thinking in terms of a very well funded, determined adversary. It's probably not too bad to count on it to protect your anonymity from technically unsophisticated attackers--but I wouldn't recommend using it for things that (say) the FBI or NSA might get very interested in.) I think the best defense against this will be something like this: Each user sets a quota of how much trafic he will take in and send out per day. Once per day, he goes through an interaction in which he downloads and uploads that much stuff, whether there's any of it for him or not. (Naturally, this won't be detectable from looking at the transmission, timing the interaction, etc.) This makes any volume variations per day disappear. Unfortunately, it also limits the user's total inflow and outflow, which means he'll have to set it to something larger than the maximum he ever expects to get. (It would be possible to have occasional overflow onto the next day's downloads, but not too often, or the user would fall further behind, on average, each day.) The size of these quotas will still leak some information, though not enough for the kinds of attacks I discussed above. Note: Please respond via e-mail as well as or instead of posting, as I get CP-LITE instead of the whole list. --John Kelsey, jmkelsey@delphi.com / kelsey@counterpane.com PGP 2.6 fingerprint = 4FE2 F421 100F BB0A 03D1 FE06 A435 7E36 -----BEGIN PGP SIGNATURE----- Version: 2.6.2 iQCVAwUBMdRWR0Hx57Ag8goBAQHCowQA71WBKkx1yonS0dEpy3pe7lgvSJPkpLUk zLjm0KeFoP+HGQBep48iILRYBlbGy5czcxNCU4zhE6+c4PWwvD+BpaGGccWWkyRi 0l/rdo5L5/1KgnpCAQJ/HNyRH0fO2NNOHvGB3m7I0H3lfmfOlNed8oIIjPFDVB23 60wpMZ9S93w= =HC1g -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- In article <199606290404.XAA32220@manifold.algebra.com>, Igor Chudov @ home <ichudov@algebra.com> wrote:
How about this attack: suppose I want to find out who hides behind an alias MightyPig@alpha.c2.org and I have the ability to monitor all internet traffic. Then I simply start mailbombing that address and see whose account gets unusually high traffic volume.
A nice, albeit quite expensive, way of pretection from traffic analysis is to create a mailing list (or a newsgroup) and forward all messages to all users of that mailing list or newsgroup. Of course, since messages are encrypted, only the recipients will be able to decrypt them.
This way the list of suspects is all subscribers of that list or newsgroup and there is no way to discriminate them.
Instead of having messages to be sent to all recipients all the time, alpha.c2.org may be programmed so that it sends out every message not to only one recipient X, but to X and 20 other randomly selected people.
It apparently makes traffic analysis much harder.
Then users of alpha.c2.org will have to install mail filters that automatically delete all incoming mail not intended to be read by them (they can't read such messages anyway).
- Igor.
[I'm copying this to remailer-operators.] Yesterday, Dave and I discussed at length a design for a new remailer network. It was motivated by the fact that, when I installed mixmaster, it mentioned Diffie-Helman and direct socket connections as a "future expansion" thing. Well, IMHO, that time has come. I wanted to hack mixmaster to accept ecash postage for the last hop, anyway, so I may as well put in the direct connection bits as well. I'll post more about this when we've discussed it more (and with Lance), and when I'm on a faster link to the net. Basically, the idea is that every remailer gets a copy of every encrypted message, using a randomized fill algorithm, and D-H encrypted links. If you're a remailer in this network, and you get a message: If you've seen this message before, drop it (this step needs more thought). If you can decrypt the message, do so, and handle the decrypted copy (but continue with the following steps with the original message anyway). If you have a message waiting to be inserted into the remailer network, drop the incoming message and take that message instead. Take whatever message you have now, and queue it to be sent to 5 random remailers. Every so often, fill your queue to a constant size with dummy messages, and send some (possibly smaller, randomly chosen) constant number of them on their way. All messages should, of course, be packetized to the same size, a la mixmaster. The result of this is that, if you are a part of this network, it should be impossible for anyone to tell when you receive a message, as opposed to anyone else in the network (think alt.anonymous.messages, but where the links are D-H encrypted, and you have a news feed to your own machine, and the message sizes are all the same, and so on...). This is perfect for making nyms. Sender anonymity is achieved by chaining. If you are part of the network, you can always claim that a message you sent was just one you received from somewhere else (you used D-H to get the messages, so you can't identify from where, though). So if you're part of the network, it would seem you are indistinguishable from anyone else on the network. Here is where the tradeoff occurs. How big should the network be? If it's too small, the above anonymity doesn't gain you much. If it's too big, you may not be able to handle all of the remailer traffic. Also, what are the issues for people who aren't on the network? It will be very hard to prevent people from noticing that they're sending a message to the network, or receiving one from it, so it seems the best we can do is to avoid letting someone be able to link incoming messages to outgoing ones. A way to help this is to have a (smaller) number of nodes be the only ones which send mail _out_ of the network. One idea which I'd like to try is having that last remailer charge postage in order to send mail out. After all, he is the one who will take the "heat" for the anon message, probably. By concentrating the outgoing messages, it should be easier to do the latency and reordering tricks. - Ian -----BEGIN PGP SIGNATURE----- Version: 2.6.2 iQCVAwUBMdVUL0ZRiTErSPb1AQH3mAQAhf0Lgh2cpahbF8JrB+hhD8ZP3oV3v9bA UsfRFEV+vcQtCopvwEsXGz6FvuyrxvYzxWE+74iPBlY204eeiTFZ0n1zq8qGRIuw kUgdM0jgNX5v5nmv+EaUeeCkuRQ5JEqIevlaD9iaK3iYO2mAVg8HFxzdmV0kLPq1 hLehErR+GX4= =7JBM -----END PGP SIGNATURE-----
Ian Goldberg wrote:
Also, what are the issues for people who aren't on the network? It will be very hard to prevent people from noticing that they're sending a message to the network, or receiving one from it, so it seems the best we can do is to avoid letting someone be able to link incoming messages to outgoing ones. A way to help this is to have a (smaller) number of nodes be the only ones which send mail _out_ of the network. One idea which I'd like to try is having that last remailer charge postage in order to send mail out. After all, he is the one who will take the "heat" for the anon message, probably. By concentrating the outgoing messages, it should be easier to do the latency and reordering tricks.
Hm, I wonder what would it take to incorporate encryption straight into sendmail (I am talking about actually encrypting not only message bodies, but also MAIL FROM: and RCPT TO: data). The protocol extension would for SMTP be something like this: A server in the welcome message may say "PGP Enhanced". If the client sees this substring, client (after HELO) may send command SENDKEY If the server answers "503 Command unrecognized", the exchange goes in the normal way. If instead a text with 214 preceding each line gets sent followed by a final ".", this text is considered a PGP key for exchange. Then usual MAIL FROM: and RCPT TO: follow, following by DATA command. The data sent by client will be PGP encrypted. Moreover, the data may have MAIL FROM: and RCPT TO: fields preceding any header information and message body. These RCPT TO: and MAIL FROM: override anything that was supplied in clear text prior to the DATA command. This change in protocol is relatively simple to implement and does not require the actual sendmail to have any cryptographic subroutines. Instead, sendmail simply calls public-key encryption programs with right command line parameters when an encrypted message is received. It is also possible to incorporate Latent-Time: into such messages. What it gives to us is that a great number of systems can participate in the more secure mail exchange. It gives clear advantages to each site running it because now they may be exchanging may securely for all users. Users of remailer network may use such PGP enhanced hosts to conceal their usage of remailer network. It is rather obvious that when the number of PGP-enhanced mailers becomes large, it will be hard to tell who is and who is not using the remailer network. - Igor.
How about this attack: suppose I want to find out who hides behind an alias MightyPig@alpha.c2.org and I have the ability to monitor all internet traffic. Then I simply start mailbombing that address and see whose account gets unusually high traffic volume. A nice, albeit quite expensive, way of pretection from traffic analysis is to create a mailing list (or a newsgroup) and forward all messages to all users of that mailing list or newsgroup. Of course, since messages are encrypted, only the recipients will be able to decrypt them. This way the list of suspects is all subscribers of that list or newsgroup and there is no way to discriminate them. Instead of having messages to be sent to all recipients all the time, alpha.c2.org may be programmed so that it sends out every message not to only one recipient X, but to X and 20 other randomly selected people. It apparently makes traffic analysis much harder. Then users of alpha.c2.org will have to install mail filters that automatically delete all incoming mail not intended to be read by them (they can't read such messages anyway). - Igor.
On Fri, 28 Jun 1996 ichudov@algebra.com wrote:
Then users of alpha.c2.org will have to install mail filters that automatically delete all incoming mail not intended to be read by them (they can't read such messages anyway).
How exactly would this be done? Since messages from alpha.c2.org are conventionally encrypted, they don't contain key id's. Wouldn't that require every recipient to store his/her passphrase and call pgp for every message to see if it could be decrypted? This in and of itself would be a more serious security breach, not to mention an _enormous_ drain on site resources.
On Fri, 28 Jun 1996, Igor Chudov @ home wrote:
How about this attack: suppose I want to find out who hides behind an alias MightyPig@alpha.c2.org and I have the ability to monitor all internet traffic. Then I simply start mailbombing that address and see whose account gets unusually high traffic volume.
A nice, albeit quite expensive, way of pretection from traffic analysis is to create a mailing list (or a newsgroup) and forward all messages to all users of that mailing list or newsgroup. Of course, since messages are encrypted, only the recipients will be able to decrypt them.
This way the list of suspects is all subscribers of that list or newsgroup and there is no way to discriminate them.
Instead of having messages to be sent to all recipients all the time, alpha.c2.org may be programmed so that it sends out every message not to only one recipient X, but to X and 20 other randomly selected people.
It apparently makes traffic analysis much harder.
Then users of alpha.c2.org will have to install mail filters that automatically delete all incoming mail not intended to be read by them (they can't read such messages anyway).
- Igor.
I think that traffic analysis can be best defeated by powerful filtering rather than any kind of multiple sending. Eventually, (as the number of messages to a particular party increases beyond the number of distractor messages sent with each mailing) it will be possible to note the statistical difference in the number of messages send to the random 20 people and the actual recipiant. A mail bombing will still reveal the true identity of the addressee as the 20 distractor address will be randomly selected each time, and the addressee will not. Instead, one might suggest, the same 20 people should be sent to as distractors. Unfortunately this leaves the actual addressee open to disclosure when he/she responds to alpha forwarded messages (you were assuming all internet traffic would be monitored, thus the response timing would be a major clue). I think the real answer to this is going to be open access pools. All encrypted messages will be left in a collective pop account, accessable by anyone at all. An agent could easily be written to poll the pop account, download the entire queue of messages and locally decode and make available only the ones addressed to the addressee. I suspect the best policy would be to purge the pop account once a month of messages older than 2 months. Traffic analysis will reveal who polls the pop account, but not much else. I suppose this could even work today if someone wrote a clever agent to poll alt.anonymous.messages.
participants (5)
-
Black Unicorn -
iang@cs.berkeley.edu -
ichudov@algebra.com -
Jeffrey A Nimmo -
JMKELSEY@delphi.com