Names
What Does Your CMS Call This Guy?
by Mark Baker
09-Jan-2002

Introduction
A content management system must manage the relationships of the information objects it contains. There are two ways to relate information objects: linking and naming. Linking creates a specific connection between two (or more) specific information objects. Naming clarifies the names of things referred to in one information object in such a way that it is possible at a later time to create a link to many different objects.
For example, in this passage from the review of Rio Lobo in Leonard Maltin's Video and Movie Guide (Signet, 1994):
<p>Hawkes' final film is a lighthearted
Western in the Rio Bravo mold, with the Duke as an ex-Union colonel out to
settle some old scores.</p>
We can create a link from the word "the Duke" to a web site about John Wayne:
<p>Hawkes' final film is a lighthearted Western in the Rio Bravo mold,
with <A HREF= "http://www.westerns.com/stars/john_wayne/index.html"> the
Duke</A> as an ex-Union colonel out to settle
some old scores.</p>
Or we can use markup to clarify that the words "the Duke" are in fact a nickname for the actor named John Wayne:
<p><director name="Howard Hawkes">Hawkes'</director>
final film is a lighthearted Western in the <movie>Rio
Bravo</movie> mold, with <actor
name="John Wayne">the Duke</actor> as
an ex-Union colonel out to settle some old scores.</p>
The advantage of this second approach is that we have now preserved the information required to form a variety of different links at a later time.
Since one of the key virtues of using a content management system is to give you control over how your information is linked, organized and delivered, forming relationships based on names rather than links is a useful way of gaining greater control over your content and increasing the range of uses you can make of it.
The purpose of this paper is to examine the types of names and naming schemes that can be used to form relationships in a content management system.
Naming things
For purposes of naming objects referred to in content we require a naming scheme that:
- adequately identifies the named object for purposes of the processing to be performed to generates links, and
- allows authors or editors to identify objects using a name they know or can easily access.
As we'll see below, these requirements sometimes compete with each other.
There are three types of names to choose from. Each has its advantages and drawbacks.
Common names
The common name is the name those familiar with the object normally call it.
Common names seldom make stable identifiers, because they are subject to ambiguities
and cannot always be guaranteed unique. They also tend to change if
- an underlying product changes name,
- the underlying vocabulary changes (as it does frequently in high tech),
- the current name proves unclear to readers, or
- the author needs to reorder conceptual or abstract material.
Abstract names
Abstract names are generally created solely for the purpose of being guaranteed
unique, unambiguous, and stable. For instance, because people share common first
names and surnames with others, most governments assign citizens a number that
identifies them uniquely. Citizens keep the same number for life, even if their
name changes. It is common practice in database systems to generate unique local
key values as abstract names for records in a table.
It is generally good practice to make abstract names meaningless, since meaningful abstract names may tend to be adopted as common names and thus be subject to all the pressures that change common names.
Formal names
For some sets of objects there are more formal naming schemes used in place
of common names where greater precision is required. In botany, for instance,
every plant has a unique Latin name, though it may have many common and local
names. Formal names require a formal model and some authority charged with the
maintenance of the model and the collection of formal names associated with
it.
Formal names are usually stable and unique, though not as absolutely so as abstract names (a botanical name may change, for instance, if the plant in question is proved to belong to a different species or family than first thought). The set of formal names often overlaps the set of common names. Houseplants, for instance, are sometimes known by common English names and sometimes by the Latin botanical names.
You can also create a system of formal names for use within your own system. Essentially, such a system would be a collection of human readable names that obey a set of rules that ensure that they are unique and unambiguous within the context in which they are used.
Working with names in markup
To understand how names are used in CM systems, let's look at how they are used in relational database systems (RDBMSs). RDBMSs rely on unique local keys as names for establishing relationships between tables. While these local keys are not actually required to be abstract names, this is the recommended practice. In well-managed databases, every new record is automatically assigned a unique and unchangeable ID number (often called the "surrogate key") when it is created. This ID number field is used as the local key for the table. When a relationship is formed from a record in one table to a record in another, the local key from one table is entered as a "foreign key" in the related table.
Database tools commonly provide simple mechanisms that developers can use to shield users from the abstract names. This allows users to see common names as they perform linking operations. The user sees a common name, but the database identifies the relationship with the abstract name. For tabular data, this provides the best of both worlds.
However, for other types of material -- descriptive, highly textual, or inherently hierarchical -- documents and document-based tagging schemes such as XML provide no such facility for hiding abstract names.
Content management systems must manage information relationships that occur both in tabular data and descriptive text. They typically do this by subsuming descriptive material into the tabular structure while maintaining full access to the relationships present in the descriptive text. To do this, a system can use XML to mark up a piece of text to indicate:
- That it is a name or a reference to an object (a real world object, or an information object).
- What kind of object it is (what namespace it belongs to).
- The controlled form of the name being used in the namespace.
In the example cited earlier:
<actor name="John Wayne">the Duke</actor>
The tag "actor" specifies that the text "the Duke" is a name (in this case a nickname) for a real world object, and it specifies that the type or namespace for that object is "actor". Actor's names are managed by the Screen Actor's Guild, so we have a formal name, "John Wayne", that corresponds to the nickname "the Duke", and this is specified by the "name" attribute.
What this markup does, in database terms, is insert a foreign key into the markup of the information component. Whether or not there is a database table with a local key of "John Wayne" can remain an open question, but the fact that the name is established in its proper namespace means that we can form a relationship with any resource that exposes the local key "John Wayne" -- now or in the future.
This is extremely useful, because it allows you to establish links from older material in your system to newly added material without having to go back and retrofit the old material. As long as the new material is given a name that you can connect to the names embedded in the old content, the link can be formed dynamically when the material is presented. Thus if you subsequently add a biography of John Wayne to your collection of movie-related content, a link from "the Duke" in the Rio Lobo review to this biography can be generated (based on the name "John Wayne") the next time the review is accessed.
Naming database records
In some cases, you may want to establish a direct relationship from a phrase in your content to a database record in your system. You may do this either because
- the database record actually contains the thing you wish to refer to, or
- you are employing a database table to contain extended identifying information that you can traverse to link to other useful bits of content.
To make a reference to a database record from your markup, you can create an attribute to refer to the appropriate table, and use the value of the key field as the attribute value to identify the record. For example in the sentence
"Jacques Villeneuve's car crossed the line in first place."
you can add markup to relate the words "Jacques Villeneuve's car" to the entry for "Williams" (the name of the Formula 1 team Villeneuve drove for) in the "Car" table with simple markup:
"<car ID="342">Jacques Villeneuve's
car</car> crossed the line in first place."
The problem is that the author looking at this markup has no easy way to verify the accuracy of the link or to understand what the link means.
We can solve this by adding the common name to the markup:
"<car ID="342" name="Williams">Jacques
Villeneuve's car</car> crossed the line in
first place."
Now it is clear what the markup means. (The tag should be generated automatically from the database, both to ensure accuracy and to simplify authoring.)
Of course, this markup has now become rather verbose. In fact, the information object in question is named three times: by its abstract name ("342"), its common name ("Williams") and an ad hoc name ("Jacques Villeneuve's car"). There are advantages to this verbose markup, however, as it greatly improves referential integrity checking and change management.
Conclusion
Using markup to clarify the meaning of names in your content -- rather than embedding pre-selected links -- opens up many opportunities to provide linking that is appropriate to a particular audience or media, and to achieve consistent linking policies for your information set. It also simplifies link management and makes life easier for authors. It also requires you to think carefully about the naming scheme your content requires.
For more information on this approach to relationship management in a content management system see the OmniMark Technologies white paper "Content Engineering" at http://www.omnimark.com/products/contentengineering.pdf.


