Get the real story via our bi-monthly newsletter

Search

    4
    0

rss

Send to a colleague

Home > Commentary > Trends Archive > Drupal, Mollom, and the Future of Blog Spam

Browse TrendWatch Blog

Recent Blog Entries

The Complete Archive

Trends by Vendor


TrendWatch by Channel

Web Content Management Trends

Enterprise Portals Trends

ECM Trends

Web Analytics Trends

Enterprise Search Trends

SharePoint Trends

Digital & Media Asset Management Trends

XML & Component Content Management Trends

E-mail Archiving & Management Trends

Enterprise Social Software Trends


Report Excerpt

The Web CMS Report 2009 looks at... Scenario-based RFPs

"Sometimes called "scenarios," use cases can provide a much richer way of describing your needs and connecting them to business benefits. Use cases can also vastly simplify and speed up the vendor selection process by giving everyone a target to shoot for... We like to develop use cases that start with an "as-is" and then describe an ideal "to be." The more detailed your use cases are, the more you will be able to differentiate solutions. But if time is short, even simple use cases may suffice. Consider this one:..."

(p. 579)

More about The Web CMS Report 2009

Our customers say

"I wish I had found your Web CMS Report six months ago. The "Pitfalls to Avoid" section is worth its weight in gold!
- - Georgeann Elliott Moss, Director of Internet Publishing,
Dallas County Community College District

NEW at CMS Watch

The Digital & Media Asset Management Report 2009 The Digital & Media Asset Management Report 2009: This report provides comparative evaluations of 18 digital media and asset management offerings... Read more
Fundamentals of Digital Asset Management Digital & Media Asset Management Online Education Course: This course will provide you with a thorough grounding in Digital and Media Asset Management technology... Read more
The Search & Information Access Report 2009 The Search & Information Access Report 2009: This report provides comparative evaluations of 20 search and information access offerings... Read more

 

TrendWatch Blog

Drupal, Mollom, and the Future of Blog Spam

11-Jun-2008

Is it just me, or has anyone else been struck by the lack of attention being paid to blog comment spam?

No one needs a reminder of how severe the spam problem is with e-mail. But e-mail spam is just one piece of the spam pie. (Oh man, talk about a hard-to-swallow metaphor...) Somewhere between 80 and 90 percent of comments posted to blogs and/or wikis come from spambots or their human surrogates. Bear in mind, as technologies go, blogging is fairly new by comparison to e-mail. We're still near the beginning of the blog-spam curve.

To the extent that Social Software and Web CMS vendors sell, bundle, or pre-integrate blog and wiki solutions for you to employ beyond the firewall, they're selling you spam magnets as part of the deal. But they're not necessarily helping you with spam filtration.

You'd expect Social Software purveyors to be pioneers in this area, and some of them have decent services. But surprisingly, many of the vendors covered in our just-published Enterprise Social Software Report 2008: Networking & Collaboration Within and Beyond the Enterprise scored rather poorly on anti-spam capabilities.

Typical remedies for blog spam include comment moderation, challenge-response techniques, and automated filtering based on some combination of reputation assessment and AI-based text analysis. There are problems with all three approaches.

Moderation is tantamount to hand-processing. This is impractical in many cases and will only become more so over time.

A more practical deterrent is the CAPTCHA (a common challenge-response technique). The idea is that if you can correctly identify the letters in a deformed Gif image of a word, you're human, not a spambot, and therefore can be trusted not to post garbage. The CAPTCHA deters robots remarkably well (so far, at least), but it also deters legitimate posters to some extent. (Not everyone wants to play a word game in order to leave a comment.) It will not deter a malicious human. Offshore boilerrooms of paid CAPTCHA-breakers can (and do) still break through.

Filtering based on AI-driven text analysis can be effective for blog comments as well as e-mail. The problem with text analysis is that unless misclassification errors can be kept to just a couple of percent, you're still letting a lot of junk through. Consider a blog that receives 100 comments. Typically, 80 will be spam. An AI-based spam filter that's 90 percent accurate will let 8 bogus comments through. Since you had just 20 legitimate comments to begin with, you're left with a situation where over a quarter of your published comments (8 of 28) are bogus.

Comment spam mitigation technology is obviously a work in progress. Some interesting new work in this area is being pursued by none other than Dries Buytaert (creator of Drupal). Buytaert, along with university classmate Benjamin Schrauwen, recently introduced Mollom, a comment-filtering SaaS offering (free for non-commercial users). Buytaert and Schrauwen hold doctorates in computer science. Schrauwen's is in machine learning.

Mollom relies mostly on proprietary text analysis techniques, but takes a multi-tiered approach. When a comment arrives for analysis, it is given a score of ham (good), spam (bad), or uncertain. When the content's quality is uncertain, Mollom issues a CAPTCHA challenge to the submitter. If the submitter passes the CAPTCHA test, the content is marked as good. Buytaert and Schrauwen claim that Mollom (currently used by 1459 websites) is 99.78 percent effective.

What makes Mollom better than, say, Akismet? It's hard to know, at this point. Mollom's algorithms are a closely guarded secret (but are likely to be the original work of Schrauwen). Akismet says only that it runs "hundreds of tests" on every incoming comment (which sounds more than a bit Rube Goldberg-ish).

Mollom's most important differentiator may ultimately be its ability to act as an OpenID reputation service. For every incoming request associated with an OpenID value, Mollom updates the reputation of that ID based on the scoring of the associated comment(s). Over time, the trustworthiness of any user who has an OpenID becomes a simple table lookup rather than an elaborate exercise in artifical intelligence.

If you're in the process of selecting a Web CMS and/or Social Software vendor, and you plan to deploy public-facing blogs or wikis, be sure to take comment spam mitigation into account. Moderation of comments (by humans) is inherently costly. A SaaS service like Mollom or Akismet may not completely eliminate the need for moderation but could be money well-spent. One thing is certain: spam is something you need to budget for and architect around. Ask your vendors what kind of help you can expect from them. And don't settle for the sound of crickets chirping.

- Submitted by: Kas Thomas, Analyst

All CMS Channel Trends

Join the conversation

Digg This! Search Technorati Tag it on Del.icio.us



Get a Free Sample

Wondering about CMS Watch research? Sign up to receive free samples of any of our products.




What we do

CMS Watch™ evaluates content-oriented technologies, publishing head-to-head comparative reviews of leading solutions. What makes us special?

  • Our critical analysis exposes product weaknesses as well as strengths
  • We deliver unrivaled technical depth and comprehensive project advice
  • Our research is led by international topic experts
  • We only work for buyers -- never for vendors

Contact us

CMS Watch

info@cmswatch.com

18113 Town Center Drive, Ste 217

Olney, MD USA 20832

1 800 325 6190 (customer service)

+1 617 763 5336 (int'l customer service)

Fax: +1 214 242 3048