Re: Anti-Spambot: what algorithm should be used?

ichudov@algebra.com (Igor Chudov @ home) writes:
Hi,
As we all know, there exist certain programs, called "spambots", whose task is to post various messages to many newsgroups simultaneously.
Besides their posting functionality, certain spambot programs take special care to make their spams undetectable by anti-spambots. In particular, they can be programmed to modify certain fields or the message text itself in such a way that these messages would not look unique, but would still carry the same content.
We can generalize the things that spambots might do and suggest that a general spambot would do the following to avoid spam detection:
1) modify all header fields, for example From: Subject:, etc, with each spam posting. 2) Follow up to other articles posted to newsgroups so that the spams would look like genuine unique messages to the readers, and defeat spam detectors
Right - if you're following up in a newsgroup, you can just re-use the subject. You can also use the "From:" and other headers from one of the regular posters.
3) Randomly altering the spam message proper such that blindly comparing
If you're following up, then rather than being random, you can tailor you response based on the message you're following up on - kind of like Eliza or better. :-)
them would be futile. Such alterations may include interchanging certain synonymous words, adding spaces or punctuation, or simply changing line wrapping length. 4) Swapping paragraphs and phrases. 5) Add random headers, footers & fillings (like ASCII art)
I am sure that the readers can come up with more examples.
The task (or the problem) is:
a) come up with a reasonable set of assumptions of what such a spambot would or could do b) Create an algorithm which would print Message-IDs of messages that have identical content, so that most if not all of the judgments of this algorithm would be correct, assuming that the spambot operates within the limits of a).
A message can be thought of as a sequence of words, phrases and paragraphs, as well as a set of header lines.
Path: header field may be specially treated.
I don't think it's possible. --- Dr.Dimitri Vulis KOTM Brighton Beach Boardwalk BBS, Forest Hills, N.Y.: +1-718-261-2013, 14.4Kbps
participants (1)
-
dlv@bwalk.dm.com