Get the real story via our bi-monthly newsletter

Search

    2
    0

rss

Send to a colleague

Home > ECM Suites > A Metadata Primer

Get a Free Sample

Wondering about CMS Watch research? Sign up to receive free samples of any of our products.

Report Excerpt

The ECM Suites Report 2009 looks at... Open Text's Document Management

"At an enterprise level, Open Text can add value in heterogeneous environments through the use of what it calls "Enterprise Library Services." In some ways this is the core of the LiveLink offering, a platform to manage content wherever it resides. This is very different from the "put everything in my repository" approach to ECM. Instead, Open Text recognizes that content will reside on file servers, databases and any number of third-party repositories. With Enterprise Library Services, Livelink aims to manage the metadata centrally for these disparate resources. Of course, integrating metadata will prove much more complicated than integrating data ..."

(p. 198-199)

More about The ECM Suites Report 2009

Our customers say

"A must-read ECM bible for all enterprises dealing with content.
- - Santosh Nallapeta,
ECM Practice Lead, Wipro Technologies

NEW at CMS Watch

The SharePoint Report 2009 The SharePoint Report 2009: This report will help your team decide whether and where and when to apply SharePoint to your information management problems.... Read more
Evaluating Native SharePoint Services SharePoint Online Education Course: This course will enable you to assess whether, where, and how to use SharePoint... Read more
The Web CMS Report 2009 The Web CMS Report 2009: In its 15th edition, this report evaluates 42 web content management systems and vendors... Read more

Glossary

Document Management

Indexing

Localization

Metadata

Personalization

RDBMS

Workflow

XML



 

Metadata

A Metadata Primer

by Ann Rockley
13-Feb-2003


Excerpted from Managing Enterprise Content: A Unified Content Strategy, Ann Rockley et al, ISBN:0735713065; Chapter 5. Copyright 2002. Reproduced by permission of New Riders.


As you have no doubt noticed, more information is available than ever before -- on the Web, your company intranet, in your content management repository, and elsewhere. This is both exciting and problematic, and extremely frustrating when you can't find what you're looking for.

What is missing is information about the information -- that is, labeling, cataloging and descriptive information -- that enables a computer to properly process and search the content elements. This information about information is known as metadata.

Although metadata has been a buzz word in the information technology and data warehousing business for some time, it has recently emerged as an important concept for developers of enterprise content management and Web publishing solutions. With more complex authoring processes and information delivery requirements, you need some way of classifying and identifying all of the information or content "bits" so that they can be retrieved and combined in meaningful ways for users. Well-designed metadata can provide the classification and identification you need.

In a unified content strategy, metadata enables content to be retrieved, tracked, and assembled automatically. Metadata enables:

  • Effective retrieval
  • Systematic reuse
  • Automatic routing based on workflow status
  • Tracking of status
  • Reporting

Unified content requires two types of metadata: categorization and element. Users tend to find information based on categorization metadata, whereas authors tend to retrieve information based on element metadata.

Categorization metadata

For years, libraries have used metadata to catalog and categorize documents. Without the card catalog and the Dewey Decimal system it would have been impossible to find content in a library. Without metadata it is nearly impossible to find content online..

The increasing use of portals has encouraged organizations to make the portal the central location for access to organizational content. However, as each new piece of content is added, users' ability to find content decreases. Corporate information needs to be just as accessible as library content, which means organizing content in a logical structure, categorizing it, and using the categories to add metadata to the information.

To begin the process of creating categorization metadata you need to understand your users. Understanding your users helps you to define the ways in which they will retrieve information. Ask the following questions:

  • Who is going to retrieve the content?
    For example, customers will want to retrieve product information, marketing will want to retrieve reports, product information, and industry materials, and employees will want to retrieve policies and procedures.

  • What tasks are they trying to accomplish with the content?
    For example, are they trying to complete a task or make a decision? You need to categorize content for tasks in areas such as procedures, while policies may be categorized separately.

  • What terms will they use when retrieving the content?
    Anticipating the terms that people will use is always difficult. Everyone uses different terms and thinks about information in different ways. You will never be able to ensure that content will be accessible under everyone's terms, but understanding how people will refer to content helps you to determine your taxonomy. After you develop a taxonomy you can educate your users to use the available terms.

Now you need to categorize your content and create a taxonomy. This involves:

  • Grouping or clustering related content
    As you start to categorize your content, you need to start grouping like or similar content together. These groups create categories, which are then refined by individual items in the category. For example:
    • Company benefits
    • Benefit policies
    • Benefit forms
    • Benefit frequently asked questions

    Which can be simplified to:

    • Company benefits
      • Policies
      • Forms
      • Frequently asked questions

  • Developing your taxonomy
    As you group content, categorize it, and define the terms to be used to identify your content, you are automatically creating your taxonomy. Each term in your taxonomy becomes metadata.

  • Testing your taxonomy to ensure that it is appropriate and comprehensive.
    You need to ensure that the metadata you have created is appropriate and usable by your audience. Before even using the metadata electronically, categorize some sample content and ask users to perform a usability test to ensure your taxonomy is appropriate.

Element metadata

Element metadata identifies your content at the element level, based on the elements defined in your information model (see Chapter 8, "Information modeling"). Authors use element metadata to help them manage content throughout the authoring process. There are three main types of element metadata:

  • Reuse metadata
  • Retrieval metadata
  • Tracking metadata

Metadata for reuse

Metadata for reuse identifies the components of content that can be reused in multiple areas. For example, if an overview already exists for the ABC product, you can use metadata like "content type = overview, product = ABC" to help you find the correct content to reuse.

Before even beginning to write, authors can search the content management system by metadata for reusable content. Alternatively, the content management system can automatically search for appropriate reusable content (based on models and metadata) and deliver it (systematic reuse) to authors. In both cases, metadata is very important to correctly identify the elements of content.

To determine what metadata you need to enable reuse, you need to determine the business result you are trying to achieve and build your metadata backward to achieve that result. Think about the following:

  • Where is content going to be reused?
    Across product? Across information product? If you answered yes to any of these then you need to create metadata to identify each reuse. For example:

    • Product, such as:
      ABC
      EFG
      HIJ

    • Information product, such as:
      Brochure
      Web
      Help
      User guide

  • What type of content is it?
    You also need to know the element content type for which the content is valid. Your metadata might include

    Content type, for example:

    Overview
    Caution
    Warning
    Troubleshooting
    Example

  • What else do you need to know about the content to ensure that the correct piece of content is reused?
    For example, you might also need to know to which version of the product the content applies:
    Version, for example:
    1
    2
    2.5

    Furthermore, you may need to know the region or location where the product is being sold or used, so that you can identify content such as safety regulations, language, and configuration. In this case, your metadata might include:

    • Region, for example:
      United States
      Canada
      South America
      Europe

    • Language, for example:
      English
      Spanish
      French
      Italian

    Finally, you may need to know the audience so that appropriate content is provided for each audience.

    • Audience, for example:
      Consumer
      Decision maker
      Technical support

Metadata for retrieval

Metadata for retrieval is used to help authors retrieve content and may include much or all of the metadata used for reuse. However, metadata for retrieval is more extensive then metadata for reuse, providing additional information about an element that facilitates retrieval. Think about what other information would help you retrieve content more effectively. For example, your retrieval metadata might include:

  • Title/Subject
    This type of metadata can be entered by the author, or the system can use the title that appears in the content to create this metadata.
  • Author
    The system usually automatically generates this type of metadata, based on the author information.
  • Date (creation, completion, modification)
    The system usually automatically generates this metadata as it is checked into the content management system.
  • Keywords
    This metadata can be entered by the author; however, it is preferable to provide the author with a list of keywords from which to choose. This way keywords will be used consistently.
  • Security level (who can view the content)
    This type of metadata is usually applied by the author from a selected list of options.

Metadata for tracking (status)

Metadata for tracking is particularly useful when you are implementing workflow as part of your unified content strategy. By assigning status metadata to each content element, you can determine which elements are active. You can also control what can to be done to an element and who can do it. Generally, status changes based on the metadata are controlled through workflow automation, not by end users. Sometimes an author will identify a status change such as "ready for review" because the system cannot automate this type of information. Status metadata can include such tracking items as:

  • Draft (under development by the author)
  • Ready for review
  • Reviewed
  • Approved
  • Final
  • Submitted
  • Published
  • Archived

Again, like your other metadata, you identify tracking metadata by determining the business result you are trying to achieve, and then build your metadata backward to achieve that result. Design your metadata for tracking after you have designed your workflow. This enables you to identify what metadata needs to be applied to the content at each stage of workflow to enable the workflow system to manage it.

After you have designed your metadata to support your workflow, you need to identify other metadata that can help you to track your content. For example:

  • Who created the content (author)?
  • When was it created/modified (date)?
  • Who modified the content (editor)?
  • Who reviewed/approved the content (reviewer/approver)?
  • How long did it take to create/modify/review (time)?
  • Where has it been reused (information product, product)?
  • Has it been translated (content status)?

Most content management systems automatically create some of this metadata (for example, author, date), whereas other metadata may already be defined in retrieval metadata and reuse metadata, but you should go through this exercise to make sure that you have identified all the possible metadata you require for tracking and reports.

Creating a controlled vocabulary

Metadata needs to be consistent to facilitate reuse, retrieval, and tracking. This requires a controlled vocabulary. A controlled vocabulary reconciles all the various possible words that can be used to identify content and to differentiate among all the possible meanings that can be attached to certain words. Using an unlimited or uncontrolled set of metadata terms leads to additional work for authors (they have to figure out the metadata each time they apply it) and reduces the percentage of content that can be effectively retrieved (different terms means either using multiple terms to search or missing some content because retrievers are unaware of alternate terms). If authors can create their own metadata tags, there is a high probability they will create different metadata.

To create a controlled vocabulary:

  1. Identify your metadata categories (for example, Content type, Product).
  2. Identify the terms that make up that metadata category.

    For example:

    • Content status
    • Draft
    • Ready for review
    • In review
    • Final
    • In approval
    • Approved

In this example, "content status" is the metadata category and the controlled terms are "Draft," "Ready for review," and so on.

Uncontrolled metadata terms should be the exception to the rule. If possible, do not provide any metadata that can be defined by the author. If that is not possible, monitor the uncontrolled metadata terms to see whether patterns are emerging that could then be used to create a controlled vocabulary.

Ensuring metadata gets used

Metadata can be very valuable and useful; however, it is only valuable if it gets used. Wherever possible, automate the application of metadata. Leaving the application of metadata up to authors adds yet another burden to the authoring process and leads to inconsistency. Some authors diligently apply the correct metadata, some apply some of it correctly and some of it incorrectly, and some don't apply it at all. Unless it is applied appropriately all the time, your metadata could become useless.

Wherever possible, have the system apply the metadata. This can include automatic:

  • Categorization metadata based on the content
  • Metadata based on the template and model
  • Inheritance of metadata based on the parent (for example, if a container element is given a restricted security, all the elements within the container automatically have the same security metadata applied to them)
  • Metadata based on position in the workflow

If it is necessary for authors to add metadata, make it possible for them to add the metadata as they are authoring so that they don't have to wait until the content is checked into the content management system. For example, if a step varies based on role, let them add the role metadata as soon as they finish writing the step. If it has to wait until the content is checked into the content management system, the system either has to prompt them to add the metadata for every single element (a very tedious process), or it may be up to the author to remember to add the metadata in all the relevant places (a recipe for missed metadata).


Next:

Send Feedback

See all ECM Suites Channel feature articles.

Need to select a technology vendor, but confused about your choices? See our vendor-neutral technology reports.

Join the conversation

Digg This! Search Technorati Tag it on Del.icio.us



About the Author

Ann Rockley

Ann Rockley is President of The Rockley Group. She has built an international reputation in the field of customer-centric XML-based enterprise content management strategies and component-based information architecture through her ground-breaking work in content management and content reuse. Ann is the lead analyst for the XML and Component Content Management Report.



Get a Free Sample

Wondering about CMS Watch research? Sign up to receive free samples of any of our products.



What we do

CMS Watch™ evaluates content-oriented technologies, publishing head-to-head comparative reviews of leading solutions. What makes us special?

  • Our critical analysis exposes product weaknesses as well as strengths
  • We deliver unrivaled technical depth and comprehensive project advice
  • Our research is led by international topic experts
  • We only work for buyers -- never for vendors

Contact us

CMS Watch

info@cmswatch.com

18113 Town Center Drive, Ste 217

Olney, MD USA 20832

1 800 325 6190 (customer service)

+1 617 763 5336 (int'l customer service)

Fax: +1 214 242 3048