Random musing about words and spam

Sat Sep 6 08:28:13 PDT 2003

On Fri, Sep 05, 2003 at 09:01:51AM -0700, Major Variola (ret) wrote:

> Can we assume that the spam is generated by regexp-type programs?
>
> If so, are there good methods for inferring the regexp from examples,
> and using this to infer spamfiltering rules?
> 
> Good project for a machine learning type.

My unscientific observations 
are that there's at least 6 or 8 different formats.

Some are pretty long, i.e.:

>Subject: RE: your medications                             fygbzdwvyyjqvvpnj  uyaecf ixoimctgdtrn kwqs mxatjr

(that one could be encrypted text)

others are short or have only numbers.

My favorite spam-obfuscation technique is where they break up key words
with HTML comments, i.e. penis.
(that won't show if you are using a mail reader that
interprets HTML... read the source).

Eric