At 11:22 PM 4/25/03 -0400, Patrick Chkoreff wrote:
All this talk about meatspace suggests we should have some spare ribs with those beers.
Ribs from sacred cows are the best.
Of course, writing style and personality can help expose private key hijacking and impersonation somewhat, though even that can be counterfeited. I can recognize a post by JP May with my eyes squinted,
just looking at the shapes of the letters and paragraphs. But I bet I could do an utterly, mind-bogglingly good impersonation if I tried.
Try John Young sometime, after those beers. And Hettinga is probably a perl script :-) But seriously, you've just mentioned what's called "textual analysis". Spelling errors and other idiosyncratic choices can be used to "pierce the veil" of anonymity. That's what did in Dr. Kaczynski, who pissed on the FBI for over a decade, until his brother recognized his text. Running text through automatic translators (engl->german-engl) has been suggested, but deeper signatures may remain. It probably wouldn't have helped Dr. K. --- "A democracy cannot exist as a permanent form of government. It can only exist until the voters discover that they can vote themselves money from the Public Treasury. From that moment on, the majority always votes for the candidate promising the most benefits from the Public Treasury with the result that a democracy always collapses over loose fiscal policy always followed by dictatorship." --Alexander Fraser Tyler
On Sat, 26 Apr 2003, Major Variola (ret) wrote:
But seriously, you've just mentioned what's called "textual analysis". Spelling errors and other idiosyncratic choices can be used to "pierce the veil" of anonymity. That's what did in Dr. Kaczynski, who pissed on the FBI for over a decade, until his brother recognized his text.
Couldn't there be a standard English-based language, "Anonglish", with a subset of English grammatical rules, human-readable (though maybe with its own idiosyncrazies) and machine-parseable, which appearance would not give many more clues than that Anonglish was used? Something where grammar rules would be few, strict, and easy to machine-check, spelling as well, and still be readable to anyone who knows "standard" English? Possibly with a "translator" from "normal" English (of course with the necessity to read the translation, correct eventual semantical mistakes introduced by rearranging the words, and "anonspell-check" the result)? That would put textual analysis from comparing the errors characteristic for a given person to comparing of trains of thoughts, which is much more difficult, much less being a "reliable proof", and practically impossible for very short messages.
On Saturday, April 26, 2003, at 06:54 PM, Thomas Shaddack wrote:
On Sat, 26 Apr 2003, Major Variola (ret) wrote:
But seriously, you've just mentioned what's called "textual analysis". Spelling errors and other idiosyncratic choices can be used to "pierce the veil" of anonymity. That's what did in Dr. Kaczynski, who pissed on the FBI for over a decade, until his brother recognized his text.
Couldn't there be a standard English-based language, "Anonglish", with a subset of English grammatical rules, human-readable (though maybe with its own idiosyncrazies) and machine-parseable, which appearance would not give many more clues than that Anonglish was used? Something where grammar rules would be few, strict, and easy to machine-check, spelling as well, and still be readable to anyone who knows "standard" English? Possibly with a "translator" from "normal" English (of course with the necessity to read the translation, correct eventual semantical mistakes introduced by rearranging the words, and "anonspell-check" the result)?
That would put textual analysis from comparing the errors characteristic for a given person to comparing of trains of thoughts, which is much more difficult, much less being a "reliable proof", and practically impossible for very short messages.
REQUEST SPEC SOONEST. IDEA RELAYED BUPERS SUBJECT APPROVAL COMMAND. There are various synthetic languages, not the least of which is the form of "milspeak" used for quasi-literate military memos. But of course people aren't going to learn new human languages for such an ephemeral and mostly useless reason as to hide their textual clues. Kascinski got nailed because his rants were so long, running to many newspaper pages (as they were printed, at the FBI's request or his request, or both, I forget the details) and were filled with a lot more than just grammatical and stylistic clues: the rants had his political views, his analysis of history, etc. It's doubtful that K. would have had any interest in trying to write in some synthetic language, stripped of various stylistic choices and options. Or that we would want to. By the way, there was a book out a few years back by an academic who specializes in "forensic text analysis," e.g., analyzing the text of Shakespeare, Pynchon, etc. to do this kind of analysis. (He lived in Soquel, a town near me, and he analyzed letters by a "Wanda Tinasky" which were believed by some to be actually written by Thomas Pynchon, the famously reclusive author who, by coincidence (or not?) lived for a decade a couple of ridges over from me, in Aptos, also near Soquel. Small world. Google should turn up the author for those interested in finding the book. "We are at war with Oceania. We have always been at war with Oceania." "We are at war with Eurasia. We have always been at war with Eurasia." "We are at war with Iraq. We have always been at war with Iraq. "We are at war with France. We have always been at war with France."
You could try the Dialectizer http://rinkworks.com/dialect/ Example: On Saturday 26 April 2003 09:42 pm, Tim "Ahh Be Bad" May wrote, dig dis:
REQUEST SPEC SOONEST. IDEA RELAYED BUPERS SUBJECT APPROVAL COMMAND.
Dere is various syndetic languages, not da damn least uh which be de fo'm uh "milrap" used fo' quasi-literate military memos.
But uh course sucka's ain't goin' t'learn new human languages fo' such an ephemeral and mostly useless reason as t'hide deir textual clues.
Kascinski gots nailed cuz' his rants wuz so's long, runnin' t'many newssheet pages (as dey wuz printed, at da damn FBI's request o' his request, o' bod, ah' fo'get da damn details) and wuz filled wid some lot
dan plum grammatical and stylistic clues, dig dis: de rants had his
views, his analysis uh histo'y, etc. Co' got d' beat! It's doubtful dat K. would gots had any interest in tryin' t'scribble in some syndetic language, stripped uh various stylistic choices and opshuns.
Or dat we would wanna.
By de way, dere wuz some scribblin' out some few years back by an academic who specializes in "fo'ensic text analysis," e.g. What it is, Mama!, analyzin' de text of Shakespeare, Pynchon, etc. Co' got d' beat! t'do dis kind'a analysis. (He
mo'e political lived in
Soquel, some town near me, and he analyzed letters by some "Wanda Tinax'y" which wuz recon'd by some t'be actually written by Domas Pynchon, de famously reclusive audo' who, by coincidence (o' not?) lived fo' a decade some couple uh ridges upside from me, in Aptos, also near Soquel. Small wo'ld. Google should turn down de audo' fo' dose interested in findin' de scribblin'.
"We is at war wid Oceania. WORD! We gots always been at war wid Oceania. WORD!" "We is at war wid Eurasia. WORD! We gots always been at war wid Eurasia. WORD!" "We is at war wid Iraq. Ah be baaad... We gots always been at war wid Iraq. Ah be baaad... "We is at war wid France. We gots always been at war wid France."
-- Neil Johnson http://www.njohnsn.com PGP key available on request.
On Saturday, April 26, 2003, at 10:58 PM, Neil Johnson wrote:
You could try the Dialectizer
Example:
On Saturday 26 April 2003 09:42 pm, Tim "Ahh Be Bad" May wrote, dig dis:
REQUEST SPEC SOONEST. IDEA RELAYED BUPERS SUBJECT APPROVAL COMMAND.
Dere is various syndetic languages, not da damn least uh which be de fo'm uh "milrap" used fo' quasi-literate military memos.
Looks a lot like the Ebonicizer, which, those who search the archives can confirm, I used a few times several years ago. Pity da fool! --Tim May
At 12:58 AM 04/27/2003 -0500, Neil Johnson wrote:
You could try the Dialectizer
One of the early web anonymizing systems was the Web Canadianizer, eh? It retrieved and dialectized web pages, and while it wasn't as thorough as the Anonymizer, it worked pretty well, eh? By the way, it's amusing and sad to see that one of the features of the current Zero Knowledge Systems products is a Content Filtering feature which not only blocks Sex and Violence but also Criminal Skills. Presumably this means that it would block any access to ZKS Freedom, the Anonymizer, and most other pages with Hacker information, unless they've hacked it to whitelist their own services.
Ohhh I just can't help it. On Setoordey 26 Epreel 2003 09:42 pm, Teem Mey vrute-a:
REQOoEST SPEC SOONEST. IDEA RELEYED BOoPERS SOoBJECT EPPROFEL COMMEND.
Zeere-a ere-a fereeuoos synzeeteec lungooeges, nut zee leest ooff vheech is
furm ooff "meelspeek" used fur qooesee-leeterete-a meelitery memus. Um gesh dee bork, bork!
Boot ooff cuoorse-a peuple-a eren't gueeng tu leern noo hoomun lungooeges fur sooch un iphemerel und mustly useless reesun es tu heede-a zeeur textooel clooes. Um gesh dee bork, bork!
Kesceenski gut neeeled becoose-a hees runts vere-a su lung, roonneeng tu muny noospeper peges (es zeey vere-a preented, et zee FBI's reqooest oor hees reqooest, oor but, I furget zee deteeels) und vere-a feelled veet a lut mure-a thun joost gremmeteecel und styleestic clooes: zee runts hed hees puleeticel feeoos, hees unelysees ooff heestury, itc. It's duoobtffool thet K. vuoold hefe-a hed uny interest in tryeeng tu vreete-a in sume-a synzeeteec lungooege-a, streepped ooff fereeuoos styleestic chueeces und oopshuns. Um gesh dee bork, bork!
Oor thet ve-a vuoold vunt tu.
By zee vey, zeere-a ves a buuk oooot a foo yeers beck by un ecedemeec vhu speceeelizes in "furenseec text unelysees," i.g., unelyzeeng zee text ooff Shekespeere-a, Pynchun, itc. tu du thees keend ooff unelysees. Um gesh dee bork, bork! (He-a leefed in Suqooel, a toon neer me-a, und he-a unelyzed letters by a "Vunda Teenesky" vheech vere-a beleeefed by sume-a tu be-a ectooelly vreettee by Thumes Pynchun, zee femuoosly reclooseefe-a oothur vhu, by cueencidence-a (oor nut?) leefed fur a decede-a a cuoople-a ooff reedges oofer frum me-a, in Eptus, elsu neer Suqooel. Smell vurld. Bork bork bork! Guugle-a shuoold toorn up zee oothur fur
zee thuse-a interested in
feending zee buuk.
"Ve-a ere-a et ver veet Ooceuneea. Ve-a hefe-a elveys beee et ver veet Ooceuneea." "Ve-a ere-a et ver veet Iooreseea. Ve-a hefe-a elveys beee et ver veet Iooreseea." "Ve-a ere-a et ver veet Ireq. Ve-a hefe-a elveys beee et ver veet Ireq. "Ve-a ere-a et ver veet Frunce-a. Ve-a hefe-a elveys beee et ver veet Frunce-a."
-- Neil Johnson http://www.njohnsn.com PGP key available on request.
Sorry, there wasn't an option to translate into John Young dialect. :) -- Neil Johnson http://www.njohnsn.com PGP key available on request.
On Sat, 26 Apr 2003, Tim May wrote:
But of course people aren't going to learn new human languages for such an ephemeral and mostly useless reason as to hide their textual clues.
Hence my specification for compatibility with English (so anyone who would speak English would understand the text - my proposal is de facto drastically simplified English), and the requirement for machine translator to Anonglish from English (so the bulk of the work would be done by the machine). If you want something to be actually used, it has to be simple.
It's doubtful that K. would have had any interest in trying to write in some synthetic language, stripped of various stylistic choices and options.
Or that we would want to.
Not for "normal" communication. However, special cases where long-time protection of the nym has to be achieved, and the user has other nyms that could lead to the discovery of his True Name, would require this approach.
On Sun, Apr 27, 2003 at 03:54:19AM +0200, Thomas Shaddack wrote:
Couldn't there be a standard English-based language, "Anonglish", with a subset of English grammatical rules, human-readable (though maybe with its own idiosyncrazies) and machine-parseable, which appearance would not give many more clues than that Anonglish was used? Something where grammar
I thought it was called "wire service reporting style." :) -Declan
Thomas Shaddack wrote:
On Sat, 26 Apr 2003, Major Variola (ret) wrote:
But seriously, you've just mentioned what's called "textual analysis". Spelling errors and other idiosyncratic choices can be used to "pierce the veil" of anonymity. That's what did in Dr. Kaczynski, who pissed on the FBI for over a decade, until his brother recognized his text.
Couldn't there be a standard English-based language, "Anonglish", with a subset of English grammatical rules, human-readable (though maybe with its own idiosyncrazies) and machine-parseable, which appearance would not give many more clues than that Anonglish was used? Something where grammar rules would be few, strict, and easy to machine-check, spelling as well, and still be readable to anyone who knows "standard" English? Possibly with a "translator" from "normal" English (of course with the necessity to read the translation, correct eventual semantical mistakes introduced by rearranging the words, and "anonspell-check" the result)?
That would put textual analysis from comparing the errors characteristic for a given person to comparing of trains of thoughts, which is much more difficult, much less being a "reliable proof", and practically impossible for very short messages.
I'm starting to do something slightly similar, for different reasons. It's part of a deniable encryption project. If you have perfect compression, and you encrypt a message which has been compressed, any decryption will look sensible. This means that you don't need long keys, that brute force attacks won't work, and that any supposed decryption is deniable. Unfortunately it's theoretically impossible to achieve, and difficult to usefully approach, perfect compression. What _is_ possible, at least in theory, is super-perfect compression, wherein the set of possible messages is reduced. The way I am attempting to do it is quite similar to your proposal, but there's a long way to go yet! There's an August 2001 thread in the sci.crypt.research archives called "Grammar/dictionary-based compression for deniability:" in which I explain a bit more about it (or rather, about an earlier version). The "super" bit solves, at least in theory, the unicity problems. -- Peter Fairbrother
At 03:42 PM 4/28/03 +0100, Peter Fairbrother wrote:
If you have perfect compression, and you encrypt a message which has been compressed, any decryption will look sensible.
You do understand that building this kind of compressor implies passing the Turing test, right? For the messages to be sensible, they have to have some underlying meaning that makes sense. This isn't just compression in the sense of fast implementations of statistical models of text....
Peter Fairbrother
--John Kelsey, kelsey.j@ix.netcom.com PGP: FA48 3237 9AD5 30AC EEDD BBC8 2A80 6948 4CAA F259
At 03:42 PM 4/28/03 +0100, Peter Fairbrother wrote:
If you have perfect compression, and you encrypt a message which has been compressed, any decryption will look sensible.
You do understand that building this kind of compressor implies passing the Turing test, right? For the messages to be sensible, they have to have some underlying meaning that makes sense. This isn't just compression in the sense of fast implementations of statistical models of text....
Layer the encryptions then. A good ciphertext looks random. Take a ciphertext and encrypt it again, you get a - say - cipher2text. A decryption of cipher2text with any key then looks like a potential ciphertext. Is there a hole in this claim?
According to Schneier doing this is a bad idea - (or so I recall from the A.P. book which I've not reread in quite a while - I may be wrong) if you use the same (or similar) cypher. i.e.: blowfish(blowfish(plaintext,key1),key2) is bad, but rsa(blowfish(plaintext,key1),privatekey) is ok. ----------------------Kaos-Keraunos-Kybernetos--------------------------- + ^ + :NSA got $20Bil/year |Passwords are like underwear. You don't /|\ \|/ :and didn't stop 9-11|share them, you don't hang them on your/\|/\ <--*-->:Instead of rewarding|monitor, or under your keyboard, you \/|\/ /|\ :their failures, we |don't email them, or put them on a web \|/ + v + :should get refunds! |site, and you must change them very often. --------_sunder_@_sunder_._net_------- http://www.sunder.net ------------ On Wed, 30 Apr 2003, Thomas Shaddack wrote:
Layer the encryptions then. A good ciphertext looks random. Take a ciphertext and encrypt it again, you get a - say - cipher2text. A decryption of cipher2text with any key then looks like a potential ciphertext.
Is there a hole in this claim?
John Kelsey wrote:
At 03:42 PM 4/28/03 +0100, Peter Fairbrother wrote:
If you have perfect compression, and you encrypt a message which has been compressed, any decryption will look sensible.
You do understand that building this kind of compressor implies passing the Turing test, right? For the messages to be sensible, they have to have some underlying meaning that makes sense. This isn't just compression in the sense of fast implementations of statistical models of text....
I do realise that. More, it has to be able to fake the sender, not just a random human. I'm not trying to build that sort of compressor tho' - but see my ps. The compressor I'm beginning to build now does not have to pass a Turing test directly. It can only compress a limited subset of possible messages, and if that subset is small it's easy to see that it can be done. Say your possible messages are: Attack at dawn Attack at dusk Retreat at dawn Retreat at dusk Assign a number to the verb, and a number to the "time" (not being a grammarian I don't know offhand what that part of speech is called). In this limited case that's just two bits, so eg "Attack at dawn" compresses as 0x00. Now encrypt that 0x00, using eg an XOR with a key of 10, to give ciphertext 0x10. Decrypting that with key 00 gives message 0x10 - which decompresses to "Attack at dusk", a plausible decryption*. There are further considerations when variable sentence structures, multi-sentence messages (and lots more things) are considered, of course. For instance, longer messages have to be self-consistent, which can be done using closeness arrays and best-fit techniques. And doing it on a wider scale is harder, and a whole lot of work... * However, if "Attack at dusk" is an unlikely message because of real-world events, eg you have already won the battle, then the decryption loses some plausibility... There are several ways around that. First is to have a "godlike" compressor which knows everything in the real world, or at least everything any sender is likely to send, so that _all_ possible decryptions are real-world plausible, but that's not within my ability to write. It's impossible anyway (unless you're God). Second is to just accept that only a portion of possible decryptions will be real world plausible (most, if not all, should be language-plausible and self-consistent-plausible). It shouldn't be hard to get a small proportion to be rwp. This is still very useful, as an attacker can't distinguish between a brute-forced set of real-world plausible decryptions (a subset of all possible decryptions, which should be large enough to contain many examples of contradictory decryptions), and a purportedly real decryption can be challenged by producing a different real-world plausible decryption, or preferably ten thousand of them. Having a fake key that decrypts to a rwp decryption can be done, if the fake key is prearranged before the message is sent. Useful when lots of messages are encrypted with the same key. If you can check the decryption first, you can also afterwards select a key that will give a rwpd. Third is to try and get almost all decryptions to be rwp, using complex techniques (!) and the fact that the set of messages that can be sent is limited. For instance, if it was limited as in my example above and you wanted to tell someone that you couldn't go on a promised date this evening, you would send "retreat at dusk". This is a very contrived example, of course. Unfortunately you still can't give a randomly-chosen-afterwards key which will _always_ give a rwpd, which would be _very_ nice to do. I'm investigating a few possible ways to do that, perhaps just to do it effectively without 100% of possible decryptions being rwp, but I haven't gotten any results worth repeating yet. And yes, I do know the theory that says it's impossible. Change the conditions a little and the theory might not be applicable any more. -- Peter Fairbrother ps I did some experiments a couple of years ago and got (some) rwp decryptions in most 60-word messages, and in some 200-word messages. The parser used was surprisingly important. Only tried at most a few hundred trial decryption/decompressions per message, but I didn't get anyone else to check the rwp, so the results may have been a bit subjective. That was not super-perfect tho', just an attempt to approach perfect.
participants (9)
-
Bill Stewart
-
Declan McCullagh
-
John Kelsey
-
Major Variola (ret)
-
Neil Johnson
-
Peter Fairbrother
-
Sunder
-
Thomas Shaddack
-
Tim May