Textual analysis

John Kelsey kelsey.j at ix.netcom.com
Sun Dec 14 07:36:02 PST 2003


At 09:44 AM 12/13/03 -0600, Harmon Seaver wrote:
...
>   And what is my supposed "three-space paragraph lead-ins?" The concept of
>textual analysis to prove ID has always amused me. A competent writer can 
>easily
>change writing styles from moment to moment. I well recall a university 
>english
>lit prof almost accusing me of plagarism when I wrote a piece mimicking 
>Faulkner
>and doing so well enough that the prof actually started looking thru his works
>trying to find it.

Textual analysis correctly identified the author of _Primary Colors_, 
though that was from a pretty small field of people with the right level of 
inside knowledge.  Does anyone know whether there have been real randomized 
trials of any of the textual analysis software or techniques?  E.g., is 
this an identification technique like DNA, or is it an identification 
technique like retrieving repressed memories under hypnosis (or, 
equivalently, consulting a ouiji board)?

It's not obvious to me how you'd change your writing style to defeat these 
textual analysis schemes--would it really be as simple as changing the 
average length of sentences and getting rid of the big words, or would 
there still be ways to determine your identity from that text?  I'm 
thinking especially of long discussions of technical topics--if I wrote a 
five page essay on what to look at when trying to cryptanalyze a new block 
cipher, I think it would be hard to keep readers who knew me from having a 
pretty good guess about the author, even if I tried changing terms, being 
more mathematical and less conversational, etc.  (Though this is more of a 
problem with humans familiar with my writing style, rather than with 
automated analysis.)

>Harmon Seaver
>CyberShamanix
>http://www.cybershamanix.com

--John Kelsey, kelsey.j at ix.netcom.com
PGP: FA48 3237 9AD5 30AC EEDD  BBC8 2A80 6948 4CAA F259





More information about the cypherpunks-legacy mailing list