I thought it was bad enough when I started getting 400+ spam emails every single day delivered to my primary inbox. Long ago I began trying to deal with this with a handful of simple regular expressions, but quickly decided that such an approach required too large of a continuous investment of time. Researching the alternatives, I finally moved on to Spamassassin, and though not a perfect solution, have managed to get it tuned to where I don't usually get more than 1 false-positive a month, and a reasonable amount of false-negatives. Having recently upgraded to version 3.0, it appears things have gotten even better.
Now there's a new annoyance beginning, spam comments on KernelTrap. As of yet there haven't been that many (no more than one or two a week), but I suspect it's only a matter of time. Eventually someone will write a custom tool that can automatically post spam to Drupal-powered websites. Inspired by my experience with Spamassassin, I decided to write my own spam filter in PHP using Bayesian Logic. It turned out being quite simple, only taking a few hours last weekend to get something moderately function.
This has had the odd affect that I no longer dread finding spam email, and rather now am quite excited as it gives me something to feed my new filter. Time will tell if it actually learns to distinguish spam from non-spam, but so far it's proving fun to tinker with. If it takes me six months to gather enough spam comments on my website to actually train the filter, I won't complain.
Anyone interested in how Bayesian spam filters work should check out Paul Graham's papers A Plan for Spam and Better Bayesian Filtering. Beyond that, the CRM114 papers from the MIT Spam Conferences are also worth a read. So we go, into a Brave New World...
Cool stuff
First off, thanks for putting this great site together. I imagine sometimes it seems like a thankless job, but I know I sure enjoy reading the threads you summarize. Excellent job. It's one of the regular sites I read, and that says a lot considering there's maybe a dozen that I frequent once a day.
Anyway, also wanted to thank you for the links above. Those are excellent docs. I needed to find this info once and ended up reading some things that weren't nearly as informative.
--Brian Vincent