ID of anonymous posters via word analysis?
In article <Pine.3.87.9310291032.A24998-0100000@crl.crl.com> arthurc@crl.crl.com writes:
I think that identification by buzzwords, habitual misspellings, etc. could be used to identify anonymous posters. Sentence structure is also revealing. Le style, c'est l'homme, said Voltaire. Of course, it all comes down to how much time and effort you want to put into proving, say, that SBoxx=LDetweiler.
I had a go at this just for fun when an8785 was doing his thing. I'm pretty sure I identified him correctly in the end. (The guy I thought it was, when I asked him, said 'If I were I wouldn't tell you', whereas all the other people I suspected but not as strongly all denied it violently, heh heh heh) I think this sort of analysis could be automated to a reasonable extent, to cut out the TypeI errors that the guys who did Shakespeare/Bacon analysis made. It's very easy to fool yourself if you don't have predefined criteria of comparison and a rigid marking scheme. I'm fairly sure that a sufficiently detailed analysis looking at enough different points of style would still catch someone's fingerprint even if they went out of their way to disguise their postings. The only approach I can think of that would be successful in hiding individual style is for person A to write something, person B reads it quickly, then attempts to write something with the same semantic content, but of course it will have B's grammar and phraseology and punctuation idiosyncracies. (And this only works if B is not a net poster, otherwise you recognise B and work out who his friends are :-) ) G -- Personal mail to gtoal@gtoal.com (I read it in the evenings) Business mail to gtoal@an-teallach.com (Be careful with the spelling!) Faxes to An Teallach Limited: +44 31 662 4678 Voice: +44 31 668 1550 x212
One could also imagine a 'symantic scrambler' analogous to the word analysis program, but designed to defeat it, by randomly altering the syntax of a post. The output might have to be tweaked afterwards, in order to restore some sense to it, but it would be a sort of ascii version of the cutting the words out of magazines style of ransom note. It might just be a huge lookup table of canned phrases that get swapped in to replace your 'unique,' identifiable sentences. It seems like a dedicated global search and replace, combined with some sort of die throw to dictate sentence structure might be enough to screw up word analysis, actually... Actually, the scrambler wouldn't have to be random--it could always produce output with the same word analysis signature. As long as a bunch of people were using it, or had access to it, you'd have deniability. E. Jay O'Connell____________________________________________________ "God does not play dice with the Universe"--A Einstein "No, she plays SuperScratch-Card Wingo (TM)"--Me. ____________________________________________________________________ Information Wants to Be Free PGP Public Key available by Finger
E. Jay O'Connell <ejo@world.std.com> wrote:
It might just be a huge lookup table of canned phrases that get swapped in to replace your 'unique,' identifiable sentences. It seems like a dedicated global search and replace, combined with some sort of die throw to dictate sentence structure might be enough to screw up word analysis, actually...
Actually, the scrambler wouldn't have to be random--it could always produce output with the same word analysis signature. As long as a bunch of people were using it, or had access to it, you'd have deniability.
I've seen a few programs which do this, but they were mostly for humor value. The program would pick out certain words or phrases and swap the with other words of phrases in its database, mostly cliches and other strange word usage. (such as "Like, wow, gag me with a spoon" etc...) The output was rather humorus, but most of the meaning was still preserved. I saw several of them used awhile ago during the Jon Fether fiasco on the usenet (If any of you saw that - a little 14-year old got daddy's modem and found a free internet site, and then started flamewars on several usenet groups. A few people took his flame posts and ran then thru their "filters" and then re-posted them.) Anyway, it probably wouldn't be too hard to just swap words with synonyms or reorder or replace certain prepositional phrases.
participants (3)
-
Edward J OConnell -
gtoal@an-teallach.com -
Matthew J Ghio