[cdr] Re: Random musing about words and spam

6 Sep 2003

      On Sat, 6 Sep 2003, Eric Murray wrote:
...
On Fri, Sep 05, 2003 at 09:01:51AM -0700, Major Variola (ret) wrote:
...
Can we assume that the spam is generated by regexp-type programs?
If so, are there good methods for inferring the regexp from examples,
and using this to infer spamfiltering rules?
Good project for a machine learning type.
My unscientific observations
are that there's at least 6 or 8 different formats.
Some are pretty long, i.e.:
...
Subject: RE: your medications                             fygbzdwvyyjqvvpnj  uyaecf ixoimctgdtrn kwqs mxatjr
(that one could be encrypted text)
others are short or have only numbers.
My favorite spam-obfuscation technique is where they break up key words
with HTML comments, i.e. pen<!--Mary had a little la-->is.
(that won't show if you are using a mail reader that
interprets HTML... read the source).
There are many patterns to these emails.

We've got the 'legitimate' spam, and then there is the spam that gets sent
to the list by members who subscribe the list to the spammers.

Then theres emails which are spam sitting in peoples inbox that gets
retransmitted by viruses, worms, and trojans. They may have started out as
spam but they've been hijacked for more nepharious purposes. Usually these
have lots of garbbled text in them.

Then ther are emails like the previous which are just 'snow' to blind the
users.

Another is non-english text. We've been seeing a lot more of these over
the last six months or so.

We've also been seeing lists of words being sent to email addresses. The
purpose is to dictionary attack the various security passwords on the
list.

It wouldn't surprise me one bit considering the human mind if a lot of the
spam we get isn't from non-spammers themselves. Priming the pump so to
speak.

 -- --
      ravage@ssz.com                            jchoate@open-forge.com
      www.ssz.com                               www.open-forge.com

[cdr] Re: Random musing about words and spam

Jim Choate