Re: Random musing about words and spam

3 Sep 2003

      Hello,

On Wed, 3 Sep 2003, Thomas Shaddack wrote:
...
Spammers recently adopted tactics of using randomly generated words, eg.
"wryqf", in both the subject and the body of the message. These
"pseudowords" are random, which makes them different from real words that
are made of syllables.
Could the pseudowords be easily detected by their characteristics, eg.
presence of syllables, wovel-consonant sequences/ratio, something like
that? This could shift the balance of force in spam detection again, until
the adversary will be forced to adopt the tactics of generating the random
words from syllables instead of characters. Presence of pseudowords then
could be added as one of spam characteristics.
I have, for a year or so now, been wondering about all the odd character
strings I am finding in the subjects and body of my spam, and I too
thought about keying on these for detection.

However, I immediately abandoned the idea, as a quick glance over the
content of my legitimate email - to and from developers, technical mailing
lists, etc., revealed that almost all of my legitimate email also contains
seemingly random bits of gibberish and pseudowords.

Try to write the logic that distinguishes this:

if_gre in the tree passes the mbuf to netisr_dispatch(), which in turn
calls if_handoff(), which does something similar.

(hackers@freebsd.org)

from this:

dyeiluykxoer dyeiluykcqkutknig dyeiluykkrpmhrku dyeiluykngeqx
dyeiluykoybim dyeiluykbihlyrelg dyeiluyktwucinmdyeiluykwenmttwvm

(actual spam)

I must reiterate that, given the relentless efficiency of spam-spiders,
merely publishing a shadow email address on all web documents that your
real email address reside on, and deleting all email sent to both accounts
is my current favorite anti-spam mechanism.  Simple to DIY, and requires
no centralization.

-----
John Kozubik - john@kozubik.com - http://www.kozubik.com