
Here's what is likely going on right now, though its availability to LE is questionable: NSA probably maintains surveillance of all or nearly all encrypted remailers. They log and archive the content of all traffic. First arrivals of messages in the remailer "cloud" provide source identification in many cases, so a database is gradually built of all the originating email addresses and IP addresses that have ever sent messages to a remailer. The same is true for the messages exiting the cloud, although many of those go to public lists or newsgroups. Sometimes, perhaps even often, the remailer cloud is so little utilised that one-hop or even multi-hop chained remalings are easily identified as to source and final destination. All goes into the database. Spook-style stylometers develop profiles of the users. Over time, more and more of the unidentified profiles become linked to known users. All it takes is one slip, or one piece of bad luck of low traffic levels. Linkage is also done on a probabilistic basis, many-to-many. Little human intervention is used except to develop and tune the correlation mechanisms. Filters aimed at content that might help correlate messages with senders is more important to the spooks than Echelon Dictionary filters. The priorities in getting a handle on something like the remailer cloud are quite different than the priorities in scanning ordinary traffic for content. Ultimately, though, the regular content filters come into play, but in this ongoing exercise of cat and mouse, the bulk of the effort is directed at developing the combination of surveillance and intelligent processing that can provide the basis for identifying, at least to some knowable probability, the sources and destinations of given remailer messages. Source and destination correlation can benefit greatly from knowledge of conventional email correspondent relationships. Most nonprofessional-spook people who send encrypted remailed email to a non-public destination are also likely to correspond with that destination in the clear. Archival logging of traffic is like a time machine - even if one ceases conventional correspondence with another, any past traffic can reveal the correspondence relationship. Mathematical correlation, given a very large amount of raw data consisting of timestamped message events, can reveal quite a lot about likely correspondence relationships. Even given a remailer network full of chaff, with reordered message pools and such, just running correlations on the end point data - who sent and when, and who received and when - can develop likely correspondent pairs. Be assured that the manipulation of such data relies heavily on probablistic clustering supported by database mechanisms not seen much in business environments. Instead of having firm relationships within an order or two of magnitude of the terminal node population, I would want something that allows having very large numbers of fuzzy relationships numbering many orders of magnitude greater than the node population. Nodes A, B, C and D send and receive remailer traffic. Everything that A has ever sent is clearly related to A, but each item can and likely would also have a probabilistic relationship to B, C, and D. The probability assignments would be updated as later analysis and later data more clearly identify who may have received which messages. Content comes into play here, too. At any time it is possible to query the database for the likely correspondents of A, and the likely messages A may have sent to B, C, or D, and get a result ordered by confidence. Similarly, it is possible to query for the likely sender(s) and recipient(s) of a given message. Occasionally, data may be firmed up - meaning achieving higher confidence levels of the relationships - by access to someone's PC, by whatever means, by stylometry, by specific content of public messages, and by fortuitous traffic analysis successes owing to the paucity of remailer traffic. It's also certain that to provide realistic cases that can be fully revealed to measure the efficacy of the various algorithms and methods involved, spooks use the remailer network themselves. That's the only way they can be sure to have any traffic that can be fully analyzed as a yardstick for the guessing games played by their software. A few rules of thumb result from even cursory examination of the likely environment: 1. Do not send clear messages through the remailers except to public lists and newsgroups. Sending clear messages to private correspondents provides the watchers with rich style and content material linked to a correspondent. It is usually easy then to link it back to you, and a firm correspondent pair is then established in their database. 2. Do not switch between clear and encrypted, remailed communication with the same correspondent. If your correspondent relationships are mapped via clear, conventional mail, those mappings can be applied to your encrypted, remailed mail to greatly narrow down the possible recipients, and quite likely combine with traffic analysis to correlate your outgoing message with one arriving at your correspondent. If the only communications you have with your deep contacts is by encrypted, chained remailer, only fortuitous or statistical correlation analysis or access to your or your correspondent's host machines will tie the two of you together. 3. Do not send traffic to significant correspondents when traffic levels are likely to be low. Good luck at guessing this. Probably the best way to get a handle on traffic levels is to run a remailer yourself. 4. Send lots of chaff. Chain some messages through the remailer cloud every day. If you have time and the ability, write and release something that will allow large numbers of people to send lots of chaff. 5. Ultimately, the only way the remailers will provide what might be described as Pretty Good Security will be when we have software that maintains a regular or random rate of messages to and from the remailer cloud, a stream into which the meaningful messages can be inserted with no visible change in traffic. Until then, the best we can do is try to keep traffic levels up, and to send and receive frequently enough to frustrate end-to-end traffic analysis. 6. Don't send anything that can have grave consequences. 7. Take names. Always take names. Some day... FUDBusterMonger It Ain't FUD til I SAY it's FUD!

Anonymous wrote:
Here's what is likely going on right now, though its availability to LE is questionable:
NSA probably maintains surveillance of all or nearly all encrypted remailers. They log and archive the content of all traffic. First arrivals of messages in the remailer "cloud" provide source identification in many cases, so
Well... I just realized that the huge amount of spam that is being relayed through remailers, might actually increase remailer security for all of us. I understand that the remailer operators probably have bandwidth limitations, but at least some spam is good. Please do not fight spam as much as you do. igor
a database is gradually built of all the originating email addresses and IP addresses that have ever sent messages to a remailer. The same is true for the messages exiting the cloud, although many of those go to public lists or newsgroups. Sometimes, perhaps even often, the remailer cloud is so little utilised that one-hop or even multi-hop chained remalings are easily identified as to source and final destination. All goes into the database. Spook-style stylometers develop profiles of the users. Over time, more and more of the unidentified profiles become linked to known users. All it takes is one slip, or one piece of bad luck of low traffic levels. Linkage is also done on a probabilistic basis, many-to-many. Little human intervention is used except to develop and tune the correlation mechanisms. Filters aimed at content that might help correlate messages with senders is more important to the spooks than Echelon Dictionary filters. The priorities in getting a handle on something like the remailer cloud are quite different than the priorities in scanning ordinary traffic for content. Ultimately, though, the regular content filters come into play, but in this ongoing exercise of cat and mouse, the bulk of the effort is directed at developing the combination of surveillance and intelligent processing that can provide the basis for identifying, at least to some knowable probability, the sources and destinations of given remailer messages.
Source and destination correlation can benefit greatly from knowledge of conventional email correspondent relationships. Most nonprofessional-spook people who send encrypted remailed email to a non-public destination are also likely to correspond with that destination in the clear. Archival logging of traffic is like a time machine - even if one ceases conventional correspondence with another, any past traffic can reveal the correspondence relationship.
Mathematical correlation, given a very large amount of raw data consisting of timestamped message events, can reveal quite a lot about likely correspondence relationships. Even given a remailer network full of chaff, with reordered message pools and such, just running correlations on the end point data - who sent and when, and who received and when - can develop likely correspondent pairs.
Be assured that the manipulation of such data relies heavily on probablistic clustering supported by database mechanisms not seen much in business environments. Instead of having firm relationships within an order or two of magnitude of the terminal node population, I would want something that allows having very large numbers of fuzzy relationships numbering many orders of magnitude greater than the node population. Nodes A, B, C and D send and receive remailer traffic. Everything that A has ever sent is clearly related to A, but each item can and likely would also have a probabilistic relationship to B, C, and D. The probability assignments would be updated as later analysis and later data more clearly identify who may have received which messages. Content comes into play here, too.
At any time it is possible to query the database for the likely correspondents of A, and the likely messages A may have sent to B, C, or D, and get a result ordered by confidence. Similarly, it is possible to query for the likely sender(s) and recipient(s) of a given message. Occasionally, data may be firmed up - meaning achieving higher confidence levels of the relationships - by access to someone's PC, by whatever means, by stylometry, by specific content of public messages, and by fortuitous traffic analysis successes owing to the paucity of remailer traffic.
It's also certain that to provide realistic cases that can be fully revealed to measure the efficacy of the various algorithms and methods involved, spooks use the remailer network themselves. That's the only way they can be sure to have any traffic that can be fully analyzed as a yardstick for the guessing games played by their software.
A few rules of thumb result from even cursory examination of the likely environment:
1. Do not send clear messages through the remailers except to public lists and newsgroups. Sending clear messages to private correspondents provides the watchers with rich style and content material linked to a correspondent. It is usually easy then to link it back to you, and a firm correspondent pair is then established in their database.
2. Do not switch between clear and encrypted, remailed communication with the same correspondent. If your correspondent relationships are mapped via clear, conventional mail, those mappings can be applied to your encrypted, remailed mail to greatly narrow down the possible recipients, and quite likely combine with traffic analysis to correlate your outgoing message with one arriving at your correspondent. If the only communications you have with your deep contacts is by encrypted, chained remailer, only fortuitous or statistical correlation analysis or access to your or your correspondent's host machines will tie the two of you together.
3. Do not send traffic to significant correspondents when traffic levels are likely to be low. Good luck at guessing this. Probably the best way to get a handle on traffic levels is to run a remailer yourself.
4. Send lots of chaff. Chain some messages through the remailer cloud every day. If you have time and the ability, write and release something that will allow large numbers of people to send lots of chaff.
5. Ultimately, the only way the remailers will provide what might be described as Pretty Good Security will be when we have software that maintains a regular or random rate of messages to and from the remailer cloud, a stream into which the meaningful messages can be inserted with no visible change in traffic. Until then, the best we can do is try to keep traffic levels up, and to send and receive frequently enough to frustrate end-to-end traffic analysis.
6. Don't send anything that can have grave consequences.
7. Take names. Always take names. Some day...
FUDBusterMonger
It Ain't FUD til I SAY it's FUD!
- Igor.
participants (2)
-
Anonymous
-
ichudov@Algebra.COM