Re: traffic analyzing Chaum's digital mix
-----BEGIN PGP SIGNED MESSAGE----- From: Wei Dai <weidai@eskimo.com>
I have been thinking about the problem of traffic analysis of a remailer. [...] The basic approach is to use this raw traffic information to calculate a SCORE for each user of the remailer with respect to Alice, where the user with the highest SCORE is the person Alice is most probably communicating with. The idea is that with a Chaumian mix, every time Alice sends a message to Bob there is always a pattern of Alice sending a message to the mix, followed by Bob receiving a message from the mix during the next batch. By counting the number of such correlations for each user over a period of time, and taking into account the fact that users who receive more messages from the mix will have higher numbers of coincidental correlations, a SCORE can be calculated so that it would be a good indication over the long run of the probability that a particular user is communicating with Alice.
This sounds like a good idea. It was very interesting to see your earlier result on the impact of dummy messages on this approach. Even a relatively small number of batches without dummy messages allows continual accumulation of incriminating information. I know that the Eurocrypt 89 proceedings had some articles on cryptanalyzing Chaum's mixes. My library has an excellent crypto selection but is missing this volume. Can anyone who has read this say whether there is anything in those papers that isn't obvious? Another interesting aspect of your analysis is the possible role of latency. Earlier I had thought of latency as primarily a way of doing mixing, an alternative or addition to batching which mixes messages without holding them up quite as much. But in terms of this in/out analysis latency could play a part in blurring the batch boundaries, adding more uncertainty and making the job of the analyst harder so he would need more data to establish his scores. Hal -----BEGIN PGP SIGNATURE----- Version: 2.6 iQBVAwUBLx/jixnMLJtOy9MBAQGFzwH/diYW0NSddacKyXGvsBc53FsR47R+4BSS pVprHz2LfpVl7U2FFAePMjZIGr5w24hA6nxn1brAO9v6JkVzgUabvA== =Vehs -----END PGP SIGNATURE-----
| -----BEGIN PGP SIGNED MESSAGE----- | | From: Wei Dai <weidai@eskimo.com> | > I have been thinking about the problem of traffic analysis of a | > remailer. | > [...] | > The basic approach is to use this raw traffic information to calculate a | > SCORE for each user of the remailer with respect to Alice, where the | > user with the highest SCORE is the person Alice is most probably | > communicating with. The idea is that with a Chaumian mix, every time | > Alice sends a message to Bob there is always a pattern of Alice sending | > a message to the mix, followed by Bob receiving a message from the mix | > during the next batch. By counting the number of such correlations for | This sounds like a good idea. It was very interesting to see your | earlier result on the impact of dummy messages on this approach. Even a | relatively small number of batches without dummy messages allows | continual accumulation of incriminating information. It would seem that Alice can protect Bob (or Bob can protect himself) by engaging in multiple conversations through the mix. I was thinking earlier about the concept of bit buckets; people who agree to get mail that they ignore. Alice could, when talking to Bob, send copies along the way to Fred, George, and Harry, each of whom would be running a mailbot that sees the mail is not for them, and deletes it (or, perhaps better, generates a response of encrypted nonsense to flow through the mix for a while.) Adam -- "It is seldom that liberty of any kind is lost all at once." -Hume
-----BEGIN PGP SIGNED MESSAGE----- On Fri, 20 Jan 1995, Hal wrote:
Another interesting aspect of your analysis is the possible role of latency. Earlier I had thought of latency as primarily a way of doing mixing, an alternative or addition to batching which mixes messages without holding them up quite as much. But in terms of this in/out analysis latency could play a part in blurring the batch boundaries, adding more uncertainty and making the job of the analyst harder so he would need more data to establish his scores.
Latency (by which I take to mean some kind of random delay) will probably make the analyst's job harder, but I suspect not by much. The method of analysis I outlined earlier can be modified to apply to mixes that use random delay instead of batching as the method of mixing. Instead of adding up the number of times Alice's message to the mix is followed up by a message from the mix to a user, take the sum of the probabilities that each message the user receives is from Alice. So you would do something like this for each user of the mix: message # probability this message came from Alice 1 0.000135 2 0 3 0.000012 4 0.004332 SUM: 0.004479 Each probability can be calculated from the statistical distribution of the delay time, the length of time between the Alice sending the last message to the mix and the user receiving a message from the mix, and the timing and number of other messages sent by the mix around this period of time. This method is more general than the one I talked about earlier, since it is equivelent to the former method when you apply it to a batching mix (that is, the original Chaumian mix). -----BEGIN PGP SIGNATURE----- Version: 2.6.2 iQCVAwUBLyB8YDl0sXKgdnV5AQHZxAQApKQgYfhGhBu+3QXzCEi1/3B55jgdHa6X 6ZeZQWZYjEhLXnOA6Z4SEHKjOVYpMHb+VkvW+vG+QZVR+cjajstg6HczwEguXjSX ObTm2gaQGRFaUOD+0fUEWFxxkqNxYEL0hRAesX3TyGYI/MQ4WzysweCzCk75+Dm2 glKeTRgnFKo= =36jW -----END PGP SIGNATURE-----
participants (3)
-
Adam Shostack -
Hal -
Wei Dai