Re: Random musing about words and spam - cypherpunks-legacy - lists.cpunks.org

newer
[cdr] Slashdot | RIAA Parses 'P2P'...

Re: Random musing about words and spam

older
[cdr] The Register - Garage door...

Major Variola (ret)

5 Sep 2003 5 Sep '03

12:20 p.m.

At 09:09 PM 9/4/03 -0700, Eric Murray wrote:

(it's one of about 200 checks my program makes).

Can we assume that the spam is generated by regexp-type programs? If so, are there good methods for inferring the regexp from examples, and using this to infer spamfiltering rules? Good project for a machine learning type.

Reply

Sign in to reply online Use email software

Show replies by date

Eric Murray

6 Sep 6 Sep

11:42 a.m.

New subject: Random musing about words and spam

On Fri, Sep 05, 2003 at 09:01:51AM -0700, Major Variola (ret) wrote:

Can we assume that the spam is generated by regexp-type programs?

If so, are there good methods for inferring the regexp from examples, and using this to infer spamfiltering rules?

Good project for a machine learning type.

My unscientific observations are that there's at least 6 or 8 different formats. Some are pretty long, i.e.:

Subject: RE: your medications fygbzdwvyyjqvvpnj uyaecf ixoimctgdtrn kwqs mxatjr

(that one could be encrypted text) others are short or have only numbers. My favorite spam-obfuscation technique is where they break up key words with HTML comments, i.e. penis. (that won't show if you are using a mail reader that interprets HTML... read the source). Eric

Reply

Sign in to reply online Use email software

Jim Choate

12:10 p.m.

New subject: [cdr] Re: Random musing about words and spam

On Sat, 6 Sep 2003, Eric Murray wrote:

On Fri, Sep 05, 2003 at 09:01:51AM -0700, Major Variola (ret) wrote:

...
Can we assume that the spam is generated by regexp-type programs?

If so, are there good methods for inferring the regexp from examples, and using this to infer spamfiltering rules?

Good project for a machine learning type.

My unscientific observations are that there's at least 6 or 8 different formats.

Some are pretty long, i.e.:

...
Subject: RE: your medications fygbzdwvyyjqvvpnj uyaecf ixoimctgdtrn kwqs mxatjr

(that one could be encrypted text)

others are short or have only numbers.

My favorite spam-obfuscation technique is where they break up key words with HTML comments, i.e. penis. (that won't show if you are using a mail reader that interprets HTML... read the source).

There are many patterns to these emails. We've got the 'legitimate' spam, and then there is the spam that gets sent to the list by members who subscribe the list to the spammers. Then theres emails which are spam sitting in peoples inbox that gets retransmitted by viruses, worms, and trojans. They may have started out as spam but they've been hijacked for more nepharious purposes. Usually these have lots of garbbled text in them. Then ther are emails like the previous which are just 'snow' to blind the users. Another is non-english text. We've been seeing a lot more of these over the last six months or so. We've also been seeing lists of words being sent to email addresses. The purpose is to dictionary attack the various security passwords on the list. It wouldn't surprise me one bit considering the human mind if a lot of the spam we get isn't from non-spammers themselves. Priming the pump so to speak. -- -- ravage@ssz.com jchoate@open-forge.com www.ssz.com www.open-forge.com

Reply

Sign in to reply online Use email software

8022

Age (days ago)

8023

Last active (days ago)

Download

2 comments

3 participants

tags

participants (3)

Eric Murray
Jim Choate
Major Variola (ret)