Textual analysis

Adam Shostack adam at homeport.org
Mon Dec 15 17:26:55 PST 2003


On Sun, Dec 14, 2003 at 10:36:02AM -0500, John Kelsey wrote:
| Textual analysis correctly identified the author of _Primary Colors_, 
| though that was from a pretty small field of people with the right level of 
| inside knowledge.  Does anyone know whether there have been real randomized 
| trials of any of the textual analysis software or techniques?  E.g., is 

Not as far as I know, and I spent a bit of time reading through both
Author Unknown, by Don Foster (who named Klien) and "Analyzing for
Authorship," by Jill Farringdon.

Foster is an English professor, and reads the work under analysis, and
then works by the potential authors.  His technique would be described
as intuitive, but the human brain has large power to make linkages.
Analysing for Authorship, from the University of Wales press.

"Analyzing for Authorship" really didn't strike me as better. It uses
a technique called "CUSUM," but the methodology and graphs (as I
recall) vary from text to text, and neither I, nor Alice, who read the
book for ZKS, wondering if we could build this stuff into a product,
was very impressed by it.

| It's not obvious to me how you'd change your writing style to defeat these 
| textual analysis schemes--would it really be as simple as changing the 
| average length of sentences and getting rid of the big words, or would 
| there still be ways to determine your identity from that text?  I'm 
| thinking especially of long discussions of technical topics--if I wrote a 
| five page essay on what to look at when trying to cryptanalyze a new block 
| cipher, I think it would be hard to keep readers who knew me from having a 
| pretty good guess about the author, even if I tried changing terms, being 
| more mathematical and less conversational, etc.  (Though this is more of a 
| problem with humans familiar with my writing style, rather than with 
| automated analysis.)

So, the question boils down to economics.  There's how much you need
to communicate, how much someone is willing to spend to tag you, and
how good their proof needs to be.  I suspect that for most purposes,
proof does not need to be very strong in relation to your need to
communicate.   That is, if Tricky Dick thinks you're Deep Throat, or
Saddam thinks you're the guy who betrayed him, etc.

Adam



-- 
"It is seldom that liberty of any kind is lost all at once."
					               -Hume





More information about the cypherpunks-legacy mailing list