Re: [tor-talk] End-to-end correlation for fun and profit
Thus spake Maxim Kammerer (mk@dee.su):
On Sat, Aug 25, 2012 at 1:12 AM, Mike Perry <mikeperry@torproject.org> wrote:
The Raccoon has made a believer out of me, but there are some limits to both of his/her proofs.. The full proofs can still be found here: http://web.archive.org/web/20100416150300/http://archives.seul.org/or/dev/Se...
Wrt. the first proof, it seems to me that the assumed correlation accuracy rate of 99.9% is incredibly low, and I think that the Raccoon recognized that by referring to sampling and retention at the end of his post. With the targeted attack that's similar to b Example 3b in Raccoon's post that I described in my previous comment here, where one analyzes all exit traffic without missing packets, I would expect the correlation accuracy (and as a result, match confidence) to exponentially approach 100% very quickly with the number of relevant packets seen, and extremely quickly if the traffic is interactive (i.e., browsing).
Actually, c/n of 30% in b Example 3b is close to the 25% that's discussed in the OP here, so let's redo the example with c/n=25% and different correlation accuracies (leaving the other numbers intact):
(using b bc -lb) ca = 0.999 pm = (1/5000) * (0.25)^2 ca*pm / (pm*ca + (1-pm)*(1-ca))
ca = 0.999 .01233363786760166917 ca = 0.9999 .11110246894375430565 ca = 0.99999 .55555617284636495961 ca = 0.999999 .92592671467910125759 ca = 0.9999999 .99206358969515668554 ca = 0.99999999 .99920064946444143613 ca = 0.999999999 .99992000739924807495
So reducing correlation accuracy error to 10^-9 will give you 99.99% confidence in end-to-end correlation match. I suspect that a few seconds of interactive traffic will give you a correlation accuracy that's much better than a 10^-9 error.
Well, the argument over correlation accuracy comes down to observation resolution, feature extraction ability, and academic lab conditions versus reality. For an example, let's assume that the adversary cannot see inside of Guard TLS connections. With this assumption: if at any point there's concurrent Guard TLS activity from a single client (either other circuit activity, directory fetch activity, or circuit pre-building activity), then some or all of your fine-grained timing and size information features go out the window. To see the effects of this currently, consider: Is it *really* the case that only one connection in *a billion* experiences incidental concurrent activity that interferes with or obliterates high-resolution feature extraction? I think the actual rate of random (or deliberate) concurrent activity is much higher than that, especially for heavily used tor clients, and even more so if they are serving as bridges or relays. But, against high-resolution adversaries, the really interesting question is: How little real cover traffic is actually needed to obscure timing and size information to the point where the remaining features are insufficient for high rates of correlation success? And over how many observations can such activity be expected to survive for a given base rate of similar activity? I suspect that for relatively short-lived bursts of traffic like web site views and random webapp AJAX activity, we can actually do pretty well with very little effort and overhead. Especially against the one-ended version of the correlation attack: the website fingerprinting attack, but probably against both. But for long-lived or otherwise atypical connections, you're absolutely right. There's just a whole lot of information encoded there.. Almost any level of observation will likely be able to correlate such flows eventually, and it's also hard to imagine generalized padding techniques that could blend these flows with web traffic. Unfortunately, because academia has mostly concluded that this work is uninteresting and that all forms of this problem are generally "impossible", we have no solid answers to these types of questions wrt what can be done in practice. Perhaps it is merely because defense work is less sexy than attack work when it comes to getting publications. I don't know for sure. I haven't yet figured out exactly why CS academia is broken. There's a whole lot of symptoms, though... But anyway, failing real research, there's always the botnets, the drug war, and the aliens to guide us... Can I get three cheers for Big Data? After all, I'm sure we can trust Them to tell us how the science shakes out in the end, amirite? ;). -- Mike Perry _______________________________________________ tor-talk mailing list tor-talk@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk ----- End forwarded message ----- -- Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE
participants (1)
-
Mike Perry