Re: [tor-talk] End-to-end correlation for fun and profit
Thus spake Ted Smith (tedks@riseup.net):
On Mon, 2012-08-20 at 10:33 +0300, Maxim Kammerer wrote:
Hello gentlemen, <snip> [1] http://pastebin.com/hgtXMSyx
I ran this script on the current consensus. The full results (the nodes-sniff-summary file) are below my signature. How did you compile the country-codes to IPs list? That wasn't produced by the script.
It's comforting that this approach yields quickly diminishing returns. Going from 25 to 60 networks only gets you a 10% increase in networks surveillance (if I'm reading the output correctly), and returns plateau entirely at that point (I'm considering about two percent to be in the noise, which may not be appropriate to this domain).
Also, it's not immediately clear whether eavesdropping those networks would actually get you strong enough correlation to accurately de-anonymize users[1]. If our rodent(?) friend(s?) could comment on this, I'd appreciate their expertise.
The Raccoon has made a believer out of me, but there are some limits to both of his/her proofs.. The full proofs can still be found here: http://web.archive.org/web/20100416150300/http://archives.seul.org/or/dev/Se... https://lists.torproject.org/pipermail/tor-dev/2012-March/003347.html The actual numbers from the examples of the first proof are affected by the resolution of the data retention. The core concept of the proof seems to hold no matter what (that full dragnet n^2 correlation is hard, and the amount of similar co-incident traffic - aka the base rate - is what makes it hard), but if the adversary has full observation of *all* traffic data, they *might* be able to do better than 99.9% true positive rate. It's not clear that low-resolution connection-level data retention or even sampled netflow data can provide anywhere near that true positive rate, though. A full adversary may also get to combine repeat observations (assuming it is possible to identify them as from the same user), but the post mentions that. Incidentally, my guess is that's probably one of the reasons for the huge boondoggle^W datacenter in Utah. They probably realized that to reliably track large botnet activity, they really needed to log all data forever. Well, keep sitting on the unpublished 0day software vulnerabilities, guys. That should totally help you solve both those problems, once and for all. Oh wait. ;) Anyways, the key thing I think the first proof tells us is that even sloppy defenses against correlation attacks are likely to work against dragnet surveillance/data retention, especially if you have a lot of co-incident traffic to blend in with and if the data retention resolution is low. I think this alone can justify experimentation with traffic padding to/from Guard nodes, where bandwidth is relatively cheap and plentiful. It especially justifies minimal amounts of Guard node padding to defend against the single-ended version of the end-to-end correlation attack, which is also known as the "website traffic fingerprinting attack". The single ended version is even *more* vulnerable to the properties of background traffic than the double-ended version, and has far fewer reliably recognizable traffic features to extract from data streams as well. See this blog post and its links for more details: https://blog.torproject.org/blog/experimental-defense-website-traffic-finger... It's my personal opinion that we should also experiment with Guard padding against the website traffic fingerprinting attack, and see how far that gets us against e2e correlation while we're at it. Unfortunately, current academic religious dogma tends to hold that correlation is unbeatable no matter what. This publication and research bias already has hindered and will likely continue to hinder research into viable defenses :(. The second proof wrt tagging attacks scared the crap out of me. However, the "c/n" compromise result at the end hinges crucially on nodes that fail circuits being able to attract additional traffic to make up for it. The bandwidth authorities might do this to a certain extent currently, and will certainly do it if operated in "PID feedback mode". However it's still not clear that the 3 guard node round-robin circuit selection properties of Tor wouldn't end up also hampering the attack against specific clients (unless the Guard nodes' keys were stolen and the attack is locally targeted). Either way, it's caused me to drive Nick nuts by pushing hard to include at least *some* kind of simple defense for circuit failure attacks on the client-side. How much of that actually survives in 0.2.3.x in a functional form remains to be seen :/. P.S. Incidentally, you used to be able to get the full copy of the first proof in the old seul archives at http://archives.seul.org/or/dev/Sep-2008/msg00016.html, but since seul is currently down with unknown hardware and disk issues, http://web.archive.org/web/20100416150300/http://archives.seul.org/or/dev/Se... might be the last full public copy other than your repost. I've added the Raccoon on Cc so s/he can hopefully do a full repost if the seul archives end up being destroyed forever. -- Mike Perry _______________________________________________ tor-talk mailing list tor-talk@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk ----- End forwarded message ----- -- Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE
participants (1)
-
Mike Perry