Get the real story via our bi-monthly newsletter

Search

    4
    0

rss

Send to a colleague

Home > Commentary > Trends Archive > Solr heads for an even sunnier future

Browse TrendWatch Blog

Recent Blog Entries

The Complete Archive

Trends by Vendor


TrendWatch by Channel

Web Content Management Trends

Enterprise Portals Trends

ECM Trends

Web Analytics Trends

Enterprise Search Trends

SharePoint Trends

Digital & Media Asset Management Trends

XML & Component Content Management Trends

E-mail Archiving & Management Trends

Enterprise Social Software & Collaboration Trends


Report Excerpt

The Search & Information Access Report looks at... Autonomy's Mathematical Approach

"Based on statistical algorithms rather than language processing, it attempts to come up with results using "Meaning-Based Computing" through its "Dynamic Reasoning Engine." Marketing buzz describes the discovery process as "understanding" the meaning of content, (which may exaggerate its cognitive capabilities), but the system can be remarkably adept at digging up deeply buried information. Depending on your content, however, it could engender very poor relevance, even as it displays very high recall. "

(p. 92)

More about The Search & Information Access Report

Our customers say

"The Search & Information Access Report is jammed full of great stuff...
- - Lou Rosenfeld,
Leading independent information architecture guru

NEW at CMS Watch

The Search and Information Access ReportThe Search & Information Access Report: This newly updated 341-page Search and Information Access Report critically evaluates 23 Search and Information Access offerings from around the globe... Read more

The Enterprise Collaboration & Community Software ReportThe Enterprise Collaboration & Community Software Report : This newly updated research critically evaluates 27 Enterprise Collaboration and Community Software products head-to-head... Read more

The Enterprise Content Management ReportThe Enterprise Content Management Report : This newly updated research critically evaluates 32 Enterprise Content Management products head-to-head... Read more

 
 

TrendWatch Blog

Solr heads for an even sunnier future

28-Oct-2009   --  

Anyone who's been watching the search space for a while knows that Apache Solr -- the popular open-source search server built on Lucene -- is the elephant in the room for a great many product-selection teams these days. It may be an exaggeration to say that most product-selection discussions begin with "What about Solr?" But not by much.

The release of Solr 1.4 will no doubt only intensify debates over the virtues of build versus buy and open source versus vendor lock. Suffice it to say, Solr 1.4 is jam-packed with enhancements designed to make a Lucene-based search system more performant, more scalable, easier to replicate, and more flexible as a development platform. Space precludes a full discussion of those things here (for that, you'll need to consult our Search and Information Access Report), but a couple of items deserve quick mention.

One of Solr's signal weaknesses has been its slavish reliance on XML as an input format for document ingestion. If all of your content happens to exist natively in XML form (or can easily be converted to same), chances are you'll be thrilled with Solr from Day One and can use it more-or-less off the shelf. On the other hand, if you're trying to index a repository full of Word and PDF documents, what then? It used to be that you were on your own. 

Enter "Solr Cell" (a play on Solr Content Extraction Library, Solr CEL), a Solr 1.4 feature that uses the content-extraction capabilities of Apache Tika to parse common office document formats. With Solr Cell, you can fairly quickly set Solr up to ingest PDF, OpenDocument, Word, PowerPoint, Excel, RTF, ZIP, and other document formats. This is a welcome development indeed.

Another area in which Solr and Lucene have traditionally fallen short but are now set to make big strides is in display technology -- or to be more precise, facilitating the creation of displayable widgets. The big news here is AJAX Solr, a JavaScript framework that enables easy (or at least easier) design of search widgets that can be populated with JSON data obtained via AJAX calls. The AJAX Solr framework is actually a fork of SolrJS, which in turn was a Google Summer of Code project in 2008. AJAX Solr doesn't actually create display widgets, but it does the next best thing by providing AbstractWidget classes and hooks into your choice of any of the popular AJAX helper libraries from Dojo, jQuery, MooTools, Prototype, or even a custom library.

It turns out, Solr 1.4 and Lucene 2.9 are bringing a number of much-needed performance enhancements as well (some of them quite sophisticated). We'll provide more details to our subscribers. Let's just say, for now, that scaling a Lucene system no longer means the system has to slow to (ahem) a crawl.

Solr isn't all-powerful.  It hasn't yet incorporated text-mining tools of the kind that would raise eyebrows at, say, Autonomy or Recommind, and there's work to be done in the relevance-ranking department. Nevertheless, progress over the past year has been swift. One wonders where Lucene and its satellite projects will take the industry over the next couple of years. Something tells me that any big-search-vendor CTOs who aren't thinking about such things now will have a lot to think about when they wake up from their naps.

- Submitted by: Kas Thomas, Analyst - Twitter: kasthomas

All Search Channel Trends

Join the conversation

Digg This! Search Technorati Tag it on Del.icio.us




Get a Free Sample

Wondering about CMS Watch research? Sign up to receive free samples of any of our products.




What we do

CMS Watch™ evaluates content-oriented technologies, publishing head-to-head comparative reviews of leading solutions. What makes us special?

  • Our critical analysis exposes product weaknesses as well as strengths
  • We deliver unrivaled technical depth and comprehensive project advice
  • Our research is led by international topic experts
  • We only work for buyers -- never for vendors

Contact us

CMS Watch

info@cmswatch.com

3470 Olney-Laytonsville Road Suite 131

Olney, MD USA 20832

1 800 325 6190

1 617 340 6464

UK: +44 2033181911

Fax: +1 617 340 3541