Preventing spam comments is one of the most annoying things about having a blog. In the past I’ve tried a few different methods to control spam.
My first attempt involved flagging messages as potentially spam if they had a bad referrer, came from certain IP ranges, contained certain keywords, or contained too many links. The flagged messages wouldn’t be displayed until I approved them. This method worked some of the time, but overall it wasn’t very effective and created a lot of work for me.
My next attempt was to implement a captcha system. This has worked out pretty well, and I was surprised to find that it has been fifteen months since I started this policy. Basically, any comment containing anything that looks like a link (“http”, “://”, “www.”, “.com”, “.net”, “.org”, “@”, “href”, and some others I can’t remember off the top of my head) would be sent to the captcha page. This allows real humans to post comments with links, but stops most of the spam. However, there was an occasional spam comment that had no links. I’m not sure why the spammers would do this, unless they are trying to game systems where an IP is deemed “safe” after a non-spam comment is made. In any case, I grew annoyed with cleaning up these comments, and I didn’t like that cookies are required if you get to the captcha page.
Over the break, I decided I would implement a type of honeypot. I noticed that spam comments were always in response to my most recent post, even when comments were still open for other posts. So I guessed that the spambots are looking for the first form in the HTML. So I just stuck a comment form at the top of my page, and wrapped it in a hidden div. No humans see it, but spambots do.
To judge the effectiveness, I logged any comments submitted to the honeypot. Since I don’t value the privacy of spammers, I’ll let you view the log if you wish. As of right now, 212 spam comments were submitted in 10 days. Where it says “honeypot” means that the message was submitted to the honeypot form. If it were submitted from a valid form but contained links (and hence, was given a captcha), you’d see “contains_links.” But there aren’t any of those.
So now I have a system that is so far 100% effective, without requiring cookies and without breaking under tabbed browsing. If this continues to be effective I’ll probably disable the captchas altogether. Of course, the spammers could pretty easily overcome this obstacle if they tried.
January 16, 10:50 pm
Also, I’ve extended the length of time that comments are open. Now you have two weeks to leave comments instead of just one.
January 17, 1:26 pm
I’ve been pretty happy with the Spam Karma plugin for WordPress. If you ever decide to give WP a try, I highly recommend it. The default WordPress anti-spam plugin is Akismet, though I’ve never tried it out (you have to create a free account at their website to use it).
What I really hate is getting all the referral spam in my server logs. I do my best to filter them with .htaccess, but my keyword blacklist has gotten a lot larger, adding additional overhead to loading my site.