• Home
  • Research
  • What We Offer
  • Who We Are
  • Blog
  • Your cart is empty.
  • Log in
  • Subscribe
  • Contact Us
  • Recent Entries
  • Get Custom Feeds
Team Blog
Free Research Sample
Thomas

Drupal, Mollom, and the Future of Blog Spam

Added By Kas Thomas at 11-Jun-2008 | Twitter: @KasThomas |

Is it just me, or has anyone else been struck by the lack of attention being paid to blog comment spam?

No one needs a reminder of how severe the spam problem is with e-mail. But e-mail spam is just one piece of the spam pie. (Oh man, talk about a hard-to-swallow metaphor...) Somewhere between 80 and 90 percent of comments posted to blogs and/or wikis come from spambots or their human surrogates. Bear in mind, as technologies go, blogging is fairly new by comparison to e-mail. We're still near the beginning of the blog-spam curve.

To the extent that Social Software and Web CMS vendors sell, bundle, or pre-integrate blog and wiki solutions for you to employ beyond the firewall, they're selling you spam magnets as part of the deal. But they're not necessarily helping you with spam filtration.

You'd expect Social Software purveyors to be pioneers in this area, and some of them have decent services. But surprisingly, many of the vendors covered in our just-published Enterprise Social Software Report 2008: Networking & Collaboration Within and Beyond the Enterprise scored rather poorly on anti-spam capabilities.

Typical remedies for blog spam include comment moderation, challenge-response techniques, and automated filtering based on some combination of reputation assessment and AI-based text analysis. There are problems with all three approaches.

Moderation is tantamount to hand-processing. This is impractical in many cases and will only become more so over time.

A more practical deterrent is the CAPTCHA (a common challenge-response technique). The idea is that if you can correctly identify the letters in a deformed Gif image of a word, you're human, not a spambot, and therefore can be trusted not to post garbage. The CAPTCHA deters robots remarkably well (so far, at least), but it also deters legitimate posters to some extent. (Not everyone wants to play a word game in order to leave a comment.) It will not deter a malicious human. Offshore boilerrooms of paid CAPTCHA-breakers can (and do) still break through.

Filtering based on AI-driven text analysis can be effective for blog comments as well as e-mail. The problem with text analysis is that unless misclassification errors can be kept to just a couple of percent, you're still letting a lot of junk through. Consider a blog that receives 100 comments. Typically, 80 will be spam. An AI-based spam filter that's 90 percent accurate will let 8 bogus comments through. Since you had just 20 legitimate comments to begin with, you're left with a situation where over a quarter of your published comments (8 of 28) are bogus.

Comment spam mitigation technology is obviously a work in progress. Some interesting new work in this area is being pursued by none other than Dries Buytaert (creator of Drupal). Buytaert, along with university classmate Benjamin Schrauwen, recently introduced Mollom, a comment-filtering SaaS offering (free for non-commercial users). Buytaert and Schrauwen hold doctorates in computer science. Schrauwen's is in machine learning.

Mollom relies mostly on proprietary text analysis techniques, but takes a multi-tiered approach. When a comment arrives for analysis, it is given a score of ham (good), spam (bad), or uncertain. When the content's quality is uncertain, Mollom issues a CAPTCHA challenge to the submitter. If the submitter passes the CAPTCHA test, the content is marked as good. Buytaert and Schrauwen claim that Mollom (currently used by 1459 websites) is 99.78 percent effective.

What makes Mollom better than, say, Akismet? It's hard to know, at this point. Mollom's algorithms are a closely guarded secret (but are likely to be the original work of Schrauwen). Akismet says only that it runs "hundreds of tests" on every incoming comment (which sounds more than a bit Rube Goldberg-ish).

Mollom's most important differentiator may ultimately be its ability to act as an OpenID reputation service. For every incoming request associated with an OpenID value, Mollom updates the reputation of that ID based on the scoring of the associated comment(s). Over time, the trustworthiness of any user who has an OpenID becomes a simple table lookup rather than an elaborate exercise in artifical intelligence.

If you're in the process of selecting a Web CMS and/or Social Software vendor, and you plan to deploy public-facing blogs or wikis, be sure to take comment spam mitigation into account. Moderation of comments (by humans) is inherently costly. A SaaS service like Mollom or Akismet may not completely eliminate the need for moderation but could be money well-spent. One thing is certain: spam is something you need to budget for and architect around. Ask your vendors what kind of help you can expect from them. And don't settle for the sound of crickets chirping.

Categories: Kas Thomas, Collaboration & Community Software, Web Content Management, Implementation, Marketplace at Large, Selecting Technology, Drupal

  • Tweet This Entry

Online Education

Check out our classes and Register Today.

Evaluation Research

Get the real story about vendors and products.

My Research

Remember MeForgot password?

Not a subscriber? Learn about our subscriptions

Categories

Channel

  • Collaboration & Community Software (123)
  • Web Analytics (148)
  • Web Content Management (798)

Analyst

  • Adriaan Bloem (44)
  • Tony Byrne (660)
  • Apoorv Durga (8)
  • Jarrod Gingras (30)
  • Alan Pelz-Sharpe (59)
  • Theresa Regli (36)
  • Kas Thomas (77)

Topics

  • Asia-Pacific Marketplace (3)
  • Building Business Case (139)
  • Cloud Computing (5)
  • E-Discovery (1)
  • European Marketplace (15)
  • Governance (10)
  • Implementation (211)
  • Industry Events (1)
  • Industry Standards (110)
  • Information Architecture (84)
  • Intranets (6)
  • Marketplace at Large (504)
  • Open Source (93)
  • Selecting Technology (543)
  • Services Oriented Architecture (4)
  • Software-as-a-Service (17)
  • Usability (3)
  • Vendor Viability & Financials (128)
  • XML (28)

Industries

  • Finance (1)
  • Government (17)
  • Health Care (1)
  • Higher Ed (7)
  • Manufacturing (2)
  • Publishing-Media (4)
  • Retail (4)

Dates

  • 2010 (57)
  • 2009 (200)
  • 2008 (223)
  • 2007 (166)
  • 2006 (99)
  • 2005 (104)
  • 2004 (58)
  • 2003 (67)
  • 2002 (67)
  • 2001 (28)

Have Questions?

Sales & Customer Support

+1 800 325 6190 (USA)+44 (0) 20 3318 1911 (UK)+1 617 340 6464 (Int'l)sales@realstorygroup.com support@realstorygroup.com

All other inquiries: info@realstorygroup.com

Copyright, 2001 - 2010, Real Story Group. All rights reserved.

  • Contact Us
  • Copyright Policy
  • Privacy Policy
  • Terms of Use

The Real Story Group

  • CMS Watch
  • Enterprise Information
       Watch
  • SharePoint Watch
  • The Real Story Group

Research

  • Vendor Evaluations
  • Webinars & Advisory Papers
  • Online Education
  • Vendor Lists
  • Free Research Sample
  • Purchase Now

What We Offer

  • Research & Advisory
       Services
  • Frequently Asked Questions
  • Consulting Services
  • Customer Support
  • Contact Sales Team

Who We Are

  • We're Different
  • Our Team
  • Media
  • Customer List
  • Events
  • Contact Us

Get the real story via our bi-weekly newsletter.

Follow us on: RSS twitter

Log In

Remember MeForgot password?