On Sat, 6 Sep 2003, Eric Murray wrote:
On Fri, Sep 05, 2003 at 09:01:51AM -0700, Major Variola (ret) wrote:
Can we assume that the spam is generated by regexp-type programs?
If so, are there good methods for inferring the regexp from examples, and using this to infer spamfiltering rules?
Good project for a machine learning type.
My unscientific observations are that there's at least 6 or 8 different formats.
Some are pretty long, i.e.:
Subject: RE: your medications fygbzdwvyyjqvvpnj uyaecf ixoimctgdtrn kwqs mxatjr
(that one could be encrypted text)
others are short or have only numbers.
My favorite spam-obfuscation technique is where they break up key words with HTML comments, i.e. pen<!--Mary had a little la-->is. (that won't show if you are using a mail reader that interprets HTML... read the source).
There are many patterns to these emails. We've got the 'legitimate' spam, and then there is the spam that gets sent to the list by members who subscribe the list to the spammers. Then theres emails which are spam sitting in peoples inbox that gets retransmitted by viruses, worms, and trojans. They may have started out as spam but they've been hijacked for more nepharious purposes. Usually these have lots of garbbled text in them. Then ther are emails like the previous which are just 'snow' to blind the users. Another is non-english text. We've been seeing a lot more of these over the last six months or so. We've also been seeing lists of words being sent to email addresses. The purpose is to dictionary attack the various security passwords on the list. It wouldn't surprise me one bit considering the human mind if a lot of the spam we get isn't from non-spammers themselves. Priming the pump so to speak. -- -- ravage@ssz.com jchoate@open-forge.com www.ssz.com www.open-forge.com