Get the real story via our bi-monthly newsletter

Search

    0
    0

rss

Send to a colleague

Home > ECM > Value of Organized Knowledge

Get a Free Sample

Wondering about CMS Watch research? Sign up to receive free samples of any of our products.

Report Excerpt

The ECM Report looks at... Xythos' Partitioning Document Content

"The unusual element of Xythos architecture comes in its use of document stores. These are database structures used to store file metadata (and the file itself, if required). Though there is nothing new or innovative in managing metadata in a database and the content itself in a separate environment, the focus on partitioning content into many small document stores and then managing these via load balancers and webservers is somewhat different as it provides a web version of the more tradition client-server structure of original document/file management systems ..."

(p. 304)

More about The ECM Report

Our customers say

"A must-read ECM bible for all enterprises dealing with content.
- - Santosh Nallapeta,
ECM Practice Lead, Wipro Technologies

NEW at CMS Watch

The Search and Information Access ReportThe Search & Information Access Report: This newly updated 341-page Search and Information Access Report critically evaluates 23 Search and Information Access offerings from around the globe... Read more

The Enterprise Collaboration & Community Software ReportThe Enterprise Collaboration & Community Software Report : This newly updated research critically evaluates 27 Enterprise Collaboration and Community Software products head-to-head... Read more

The Enterprise Content Management ReportThe Enterprise Content Management Report : This newly updated research critically evaluates 32 Enterprise Content Management products head-to-head... Read more

 

Glossary

E-commerce

Indexing

Metadata

Syndication

Taxonomies

XML



 

Taxonomies

Value of Organized Knowledge

by Jack Bryar
21-Jan-2002 --


An Old Problem Gets Worse

In recent years, the volume of news and information resources available to the typical corporate employee has grown exponentially. Corporate Web traffic has jumped by over 600% annually. Web-available content exceeds several billion items. Executives frequently receive more than 200 e-mails a day. The amount of corporate data generated per employee doubles every 18 months.

    "If you printed the information available through our Intranet, it would stretch from the earth to the sun."
    -- Marc Auckland, World-Wide Chief Knowledge Manager, British Telecom

Corporate managers are worried. Sixty percent say that info-glut is having a negative effect on productivity. IDC estimated that in 1999, US Fortune 500 companies lost $12 billion due to an inability to locate knowledge resources amidst all the clutter. Eighty percent of executives believe the problem will get worse before it gets better.

Adding to the strain is the fact that this content is so difficult to access. Corporate information exists in many forms. Each form (electronic news, email, databases, Web pages, archived documents, etc) resides in its own format, accessible only though some unique index system. Often content is not easily accessible. Much of it is scattered across the enterprise.

Yet, access is critical. New applications have sprung up requiring access to information across the enterprise, sometimes across multiple businesses. These include next-generation customer care, competitive research, and B2B transactions. In order to get at the information needed to run these applications, information itself needs to be re-structured, and re-organized -- and so does the method of getting at that information.

Info-Illiteracy: A Barrier to Finding Information

The problem of business info-glut is worse than it appears. Many employees lack the skills needed to find the information they require.

For years, putting tools in the hands of the users was considered the best way for companies and their knowledge workers to get their hands on the information they needed. In most cases, that has meant providing users with a search engine similar to systems found on public Websites. Today, many knowledge workers have to navigate as many as six different search engines and database indexes each day.

New research casts doubt on how well search engines works for most users. A study of AltaVista users revealed a surprising amount of info-illiteracy. According to that study:

  • 80% couldn't/wouldn't build a working Boolean search
  • 87% used less than 3 words

Which Babe Did You Mean?

A big part of the problem is that the same term can have different meanings to different people. Not knowing which terms will uncover sought-after information is a significant barrier for many knowledge workers. Any successful strategy for managing information has to overcome this problem.

    In 1814, Thomas Jefferson was so dissatisfied with the ruined and disorganized state of the documents at the Library of Congress that he donated his collection and then personally reclassified the all the books there.
    -- Source: Systems of Knowledge Organization for Digital Libraries, Gail Hodge

XML to the Rescue?

The Internet has been described as the world's largest library, with the books thrown all over the floor. Many corporate information systems look just as disorganized. Information managers are convinced that the best solution to this clutter involves wrapping up all electronic document forms inside a common format, so that the content inside can be more easily found, and used by different applications.

The wrapper being used by most organizations today is XML. XML allows the tagging of a document with a description of what the document is about, and where it came from. Searching on XML meta-tags can certainly simplify the search process.

Unfortunately, XML does not solve the problem of finding information. It only standardizes the problem. It requires that any XML tagging system clearly understand what the document is about, and it needs to anticipate the search process someone might try to use to find it. This takes time, a great deal of sophistication, or both. Otherwise, the process results in hiding essential documents behind generic, idiosyncratic or meaningless tags, making the information management and retrieval problem even worse.

In order for XML tagging to be meaningful for search and retrieval, the terms used to tag content have to be intuitive enough to encourage their use by information-seekers. They should be structured in a standardized way; less as a set of variable keywords and more like a set of subject categories. These subject categories should be set up in a hierarchical fashion, with logical subtopics and overviews. This, in short, is a taxonomy.

Enabling an ability to search or manipulate content, "by category" is an essential benefit of a successful XML tagging process.

Taxonomies Defined

Taxonomies are sometimes called "classification schemes" or "categorization schemes." Each refers to grouping together similar items into broad "buckets" or "topics" which themselves can be grouped together in ever-broader "hierarchies." Examples of taxonomies include systems as diverse as the Dewey Decimal system found in small libraries, Yahoo's Subject Index, and the massive taxonomic system proposed by Linneaus used by generations of biology students. Wherever they are used, they have the same goal -- to organize knowledge about a given subject.

A sample taxonomy from NewEdge:

Sample Taxonomy

Taxonomies and The Search Process

Perhaps the greatest benefit to taxonomies is improved searching.

Properly constructed taxonomies simplify the process of gathering "the right" information for daily business use by simplifying the vocabulary used in the search process. Tagging systems using raw key words or similar strategies are likely to generate search error rates approaching that of straight text searches. For example, while a search on the word "DSL" will find stories on a particular type of broadband technologies, it will miss others, and may accidentally find content referring to Dutch sign language or Data SubLanguage.

A better approach would be to define these documents as belonging to the subject category, "Digital Subscriber Line." If the searcher can focus on a proven set of categories rather guess at keywords, chances of finding the right content, are far greater, and the process will be faster and more reliable.

The most important contribution of taxonomies to the search process is that they work.

Even using a relatively primitive taxonomic system, Microsoft reported a 40% improvement in hit rates. Satisfaction metrics doubled. In addition, the time spent trying to find a given document was significantly reduced. The success rate of taxonomic-based searching reduces the strain on systems and on the people who use them.

Business is Complicated

Click for larger image

Naturally, one of the most important criteria for taxonomy is that it should be easy to navigate. But building solid taxonomies is much easier said than done. Consider, for example, a taxonomy of business subjects.

Businesses vary in size and have multiple points of focus. Business activities involve an array of subjects that do not always fall into logical groupings. Subject boundaries are often fuzzy.

Subject hierarchies can feel artificial, as content, particularly business critical content, may fall into multiple categories. Indeed, most executive-level business documents involve several categories. Traditional indexing schemes dissolve in complexity as the number of unique concepts grows.

So, while some subjects are relatively easy to categorize, most business functions are not. (I should know: NewsEdge has spent several years developing a proprietary business taxonomy). Nevertheless, you should seriously consider developing a taxonomy for the content management system residing underneath your e-business efforts in general, and your Intranet in particular. Your content contributors and end-users alike will be grateful.


Next:

Send Feedback

See all ECM Channel feature articles.

Need to select a technology vendor, but confused about your choices? See our vendor-neutral technology reports.

Join the conversation

Digg This! Search Technorati Tag it on Del.icio.us



About the Author

Jack Bryar

Jack Bryar is a Practice Leader in the Knowledge Management Consulting and Editorial Services unit of NewsEdge Corporation. Bryar has helped NewsEdge clients to determine their information needs in industries ranging from energy production to international banking.  Prior to NewsEdge, Bryar led the I.T. Practice of a corporate advisory services consultancy based in Toronto, Canada.



Get a Free Sample

Wondering about CMS Watch research? Sign up to receive free samples of any of our products.



What we do

CMS Watch™ evaluates content-oriented technologies, publishing head-to-head comparative reviews of leading solutions. What makes us special?

  • Our critical analysis exposes product weaknesses as well as strengths
  • We deliver unrivaled technical depth and comprehensive project advice
  • Our research is led by international topic experts
  • We only work for buyers -- never for vendors

Contact us

CMS Watch

info@cmswatch.com

3470 Olney-Laytonsville Road Suite 131

Olney, MD USA 20832

1 800 325 6190

1 617 340 6464

UK: +44 2033181911

Fax: +1 617 340 3541