Breaking PRISM and friends
Hi all, Only a decade or two late to the party... anyway, in the past few days since the PRISM / XKeyscore / etc leaks came to my attention, I've been considering schemes that breaks the kind of passive, drag-net collection of communications data from listening points on submarine cables and the like. I think I've found one, so I thought I'd share. Code is in very, very early stages at the moment; https://github.com/lupine/hide-eid has half of a first pass, and a bit of documentation on why it might work. I'm hoping to have it in a state where you could run a pair of VPN providers servicing a few customers each within a few days. As-is, scalability is suspect, though. The short how-it-works is that it stops the IP header (which typically reveals who is talking to whom, even if the IP payload is encrypted) from being personally-identifying information. If your access ISP runs it, and your hosting ISP runs it too, you benefit from an anonymity set equal to all the source's customers. As long as there's no sniffing going on in the first and last mile, anyway. Combined with IPsec on those miles, or a vetted path if it's short enough, you can reduce the amount of cable that personally-identifying IP headers are sniffable on, from a few thousand miles, to perhaps a couple of feet - on which you can focus CCTV, if you're *really* paranoid; or even nothing at all, if you have the same box terminating the IPsec tunnel and the hide-eid wrapper/unwrapper. The theoretical background is from the location/identity separation protocol stuff. Intermediaries don't actually need to know which person (well, EID) the packet is from, or for; they just need to know where to send it (which RLOC) so that a person can pick it up. This scheme is basically that, imagined as a least-effort overlay on the existing IP network. And it doesn't break as many protocols as cgNAT, since source and destination both know the EID of destination and source. Feedback of any sort is extremely welcome. Particular areas of concern are scaling it (especially given how the crypto works), how the crypto works and if there's a more sensible way (key exchange with M:N different ISPs to take advantage of symmetric ciphers is worrisome), and whether there's a better way to get L/ISP with hidden EIDs deployed to a subset of the internet than a hack of this magnitude. I'm still fairly skeptical that it can make a noticeable difference, but it seems promising enough for me to keep it up in the short term, at least. If it ends up being useless, there's still tor. There's always tor. /Nick
How is this conceptually different from a 2-node Tor network, where each ISP operates one node of the pair linking to every other ISP (so there are I^2 pairs)? Additional benefit of using Tor would be mixing and making traffic analysis harder. Threat modelling could draw on the existing research on Tor vulnerabilities. Also, an ISP could easily, today, run single-node Tor network to obscure end point locations. The problem does not seem technical at all. The problem is that ISPs have physical addresses. What you need is a floating ISP ... go anywhere, travel light, get in, get out, wherever there's trouble, a man alone. On 8/3/13 19:57 , Nick Thomas wrote:
Hi all,
Only a decade or two late to the party... anyway, in the past few days since the PRISM / XKeyscore / etc leaks came to my attention, I've been considering schemes that breaks the kind of passive, drag-net collection of communications data from listening points on submarine cables and the like. I think I've found one, so I thought I'd share.
Code is in very, very early stages at the moment; https://github.com/lupine/hide-eid has half of a first pass, and a bit of documentation on why it might work. I'm hoping to have it in a state where you could run a pair of VPN providers servicing a few customers each within a few days. As-is, scalability is suspect, though.
The short how-it-works is that it stops the IP header (which typically reveals who is talking to whom, even if the IP payload is encrypted) from being personally-identifying information. I! f your access ISP runs it, and your hosting ISP runs it too, you benefit from an anonymity set equal to all the source's customers. As long as there's no sniffing going on in the first and last mile, anyway.
Combined with IPsec on those miles, or a vetted path if it's short enough, you can reduce the amount of cable that personally-identifying IP headers are sniffable on, from a few thousand miles, to perhaps a couple of feet - on which you can focus CCTV, if you're *really* paranoid; or even nothing at all, if you have the same box terminating the IPsec tunnel and the hide-eid wrapper/unwrapper.
The theoretical background is from the location/identity separation protocol stuff. Intermediaries don't actually need to know which person (well, EID) the packet is from, or for; they just need to know where to send it (which RLOC) so that a person can pick it up. This scheme is basically that, imagined as a least-effort over! lay on the existing IP network. And it doesn't break as many protocols as cgNAT, since source and destination both know the EID of destination and source.
Feedback of any sort is extremely welcome. Particular areas of concern are scaling it (especially given how the crypto works), how the crypto works and if there's a more sensible way (key exchange with M:N different ISPs to take advantage of symmetric ciphers is worrisome), and whether there's a better way to get L/ISP with hidden EIDs deployed to a subset of the internet than a hack of this magnitude. I'm still fairly skeptical that it can make a noticeable difference, but it seems promising enough for me to keep it up in the short term, at least.
If it ends up being useless, there's still tor. There's always tor.
/Nick
Hi, On Sun, 2013-08-04 at 01:57 -0700, m wrote:
How is this conceptually different from a 2-node Tor network, where each ISP operates one node of the pair linking to every other ISP (so there are I^2 pairs)? Additional benefit of using Tor would be mixing and making traffic analysis harder. Threat modelling could draw on the existing research on Tor vulnerabilities.
It may be misguided, but avoiding I^2 / M:N sessions was a goal. As numbers go, it's big enough to be uncomfortable (there's ~40K ASNs). There are other differences; with something like hide-eid, the source IP isn't hidden from the destination, and vice-versa. This lets SIP and FTP, for instance, work transparently over it. Also, if a peer or their ISP objects to the traffic, they know who's responsible for it so can take action. That last may be a disadvantage, depending on your preferences ;). My tor node's exit IP got added to a DNSBL for being the visible peer in abusive HTTP requests within a day or so of being started up. Traditional tor is also dog-slow by comparison; packets through hide-eid take the same network path as they ordinarily would between wrap and unwrap. I assume that a 2-node tor network would replicate this property? I don't feel qualified to comment on scalability potential to any large degree, but it's something I've got an eye on. Hopefully, it's easier to scale this kind of limited packet futzing than it is to scale an onion router.
Also, an ISP could easily, today, run single-node Tor network to obscure end point locations.
Would the end-users need to run tor as well, or does it have support for scooping up a whole network's worth of traffic, transparently? I've only gotten as far as running it, not using it... /Nick
On Sun, 2013-08-04 at 11:20 +0100, Nick Thomas wrote:
It may be misguided, but avoiding I^2 / M:N sessions was a goal. As numbers go, it's big enough to be uncomfortable (there's ~40K ASNs).
Quick update - the code is now in a state where it can tunnel arbitrary IPv4 datagrams, and does path MTU discovery / fragmentation as suggested by RFC6830. Traceroute doesn't work yet, though. IPv6 is TODO; it should be trivial to add support for IPv6 EIDs. IPv6 RLOCs are a tiny bit harder. Crypto is - very slowly - starting to look sane: - 160-bit EC private keys per RLOC - public keys -> registry (for now) - ECDH for shared secret generation for any RLOC pair - SHA256 the secret, use as secret key for symmetric cipher - Fragment packet into packets, if needed - Each packet gets 128-bit pseudo-random IV ( RAND_pseudo_bytes() ) - aes256gcm block cipher on first 512 bytes of each packet - On the wire: [ IP header, proto 99 ] [ len(iv+ciphertext+tag) ] [ iv ] [ ciphertext ] [ tag ] [ plaintext ] Obviously, the current code doesn't scale at all well, but this is in-principle parallelisable, and amenable to hardware cypto use as well. Unloaded, it adds <1ms to rtt. I'm hoping to be able to get it running at ~100Mbit/sec sometime in the next week or two. If I can get it to gigabit rates, I can start talking to small ISPs about running it, opt-in, with a straight face. If you fancy experimenting with a hide-eid node, just poke me with a public key and a range + RLOC IP (or set up your own pair, of course). I'd quite like to see it spanning large sections of the real Internet successfully. I'd also love to know if you can get it to break any IP protocols; I've only really been playing with TCP and ICMP so far. More broadly, I've still not been dissuaded against the notion that it plugs a gap in the current range of tools against widespread, generalised internet surveillance. The value of being a member of even a small anonymity set can't be understated, especially when getting into the set is more-or-less zero effort and zero cost. If anyone can convince me otherwise, well, at least I'd get my evenings (and mornings) back :) /Nick
Combined with IPsec on those miles, or a vetted path if it's short enough, you can reduce the amount of cable that personally-identifying IP headers are sniffable on, from a few thousand miles, to perhaps a couple of feet
According to the speed of light, anything under a certain maximum time from you is local. If all you had was a list of nodes, RTT could be used to determine a global path made up of small hops less likely to be directly monitored themselves. Hop count would rise with longer paths and performance drops... so perhaps only useful for creating local clusters. TTL and RTT above a minimum time are spoofable so not nearly as useful.
participants (3)
-
grarpamp
-
m
-
Nick Thomas