On Fri, Sep 05, 2003 at 09:01:51AM -0700, Major Variola (ret) wrote:
Can we assume that the spam is generated by regexp-type programs?
If so, are there good methods for inferring the regexp from examples, and using this to infer spamfiltering rules?
Good project for a machine learning type.
My unscientific observations are that there's at least 6 or 8 different formats. Some are pretty long, i.e.:
Subject: RE: your medications fygbzdwvyyjqvvpnj uyaecf ixoimctgdtrn kwqs mxatjr
(that one could be encrypted text) others are short or have only numbers. My favorite spam-obfuscation technique is where they break up key words with HTML comments, i.e. pen<!--Mary had a little la-->is. (that won't show if you are using a mail reader that interprets HTML... read the source). Eric