Random musing about words and spam
Eric Murray
ericm at lne.com
Sat Sep 6 08:28:13 PDT 2003
On Fri, Sep 05, 2003 at 09:01:51AM -0700, Major Variola (ret) wrote:
> Can we assume that the spam is generated by regexp-type programs?
>
> If so, are there good methods for inferring the regexp from examples,
> and using this to infer spamfiltering rules?
>
> Good project for a machine learning type.
My unscientific observations
are that there's at least 6 or 8 different formats.
Some are pretty long, i.e.:
>Subject: RE: your medications fygbzdwvyyjqvvpnj uyaecf ixoimctgdtrn kwqs mxatjr
(that one could be encrypted text)
others are short or have only numbers.
My favorite spam-obfuscation technique is where they break up key words
with HTML comments, i.e. pen<!--Mary had a little la-->is.
(that won't show if you are using a mail reader that
interprets HTML... read the source).
Eric
More information about the Testlist
mailing list