Get the real story via our bi-monthly newsletter

Search

    0
    0

rss

Send to a colleague

Home > ECM Suites > Considering Fluency in XML Design

Get a Free Sample

Wondering about CMS Watch research? Sign up to receive free samples of any of our products.

Report Excerpt

The ECM Suites Report 2009 looks at... SpringCM marketing

"If you were to believe the marketing hype that surrounds SpringCM, you would start to believe that the likes of EMC and IBM were quaking in their boots..."

(p. 307)

More about The ECM Suites Report 2009

Our customers say

"The analysis of core technologies from a number of different perspectives will prove most helpful to ECM consumers. It is the most comprehensive analysis of the state of the industry for ECM that I have reviewed.
- - Len Asprey, Director, Practical Information Management Solutions Pty Ltd, and,
Author, Integrative Document and Content Management

NEW at CMS Watch

The Enterprise Social Software & Collaboration Report 2009The Enterprise Social Software & Collaboration Report 2009: This newly updated research critically evaluates 27 Enterprise Social Software and Collaboration packages... Read more

The Enterprise Portals Report 2009The Enterprise Portals Report 2009: This newly updated research critically evaluates 12 Enterprise Portals products head-to-head... Read more

The Digital & Media Asset Management Report 2009The Digital & Media Asset Management Report 2009: This newly updated research critically evaluates 20 DAM suppliers head-to-head... Read more

 

Glossary

Document Management

Indexing

Metadata

SGML

Taxonomies

Workflow

XML



 

XML Authoring

Considering Fluency in XML Design

by Mark Baker
29-Mar-2004 --

Fluency is perhaps the most neglected aspect of XML markup language design. Fluency is the ability of an author to write with ease and confidence. A fluent author has all the tools of language at their fingertips. Anyone who has ever tried to write in a language not their own -- a language they were only beginning to learn -- knows how painful it is to write in a language in which you are not fluent.

The problem is not simply that lack of fluency slows you down. A writer who is not fluent never produces smooth crisp prose that is easy to read and conveys its meaning clearly and effortlessly. Even the quality of thought suffers, as your mental energy is occupied with the struggle to find the words and fit them to the grammar of the language you are using. Fluency is about having such a mastery of language that it becomes second nature. A musician who is fluent in musical notation sees past notes and staves to read a melody from the page. In writing, the melody in their head flows out as notes on a staff without any conscious thought given to the mechanics of the notation.

If fluency is key to the productivity and quality of intellectual work of all types, then sustaining fluency among content authors should remain central to any organization that seeks to produce high-quality information.

Maiming the Message

In 1990, Marcia Peoples Halio, published an article in Academic Computing entitled "Student Writing: Can the Machine Maim the Message?" It reported on an experiment that compared the quality of writing of student using a Macintosh with those using a PC. Halio concluded, based on a wide range of indicators, that the work produced on the Mac was significantly poorer in quality than that produced on a PC. [Ed.: the original article cannot be found online, but much commentary about it can.] Not surprisingly, the article produced a firestorm of controversy, but one does not have to take Halio's experiment at face value to accept that the interface through which authors create a document can impact the quality of their work. Indeed, it is a commonplace of XML apologetics to claim that XML frees authors from worrying about formatting chores, thus freeing them to concentrate on writing.

However, if XML has the potential to remove one distraction from authors, it invariably introduces another one, and potentially a more serious one. Worrying about layout and formatting while trying to write in a word processor can certainly interfere with the efficiency and efficacy of the writing process, but at least the graphical aspects of the interface are separate from the natural language in which the author is working. Markup, on the other hand, is inserted directly into the flow of the text itself.

It is important here to make a distinction between XML used to implement a semantic markup strategy and XML used simply the underlying file format of a graphical application. A traditional WYSIWYG word processor can use XML as a file format, as does OpenOffice for instance. When used like this, XML has no more impact on the fluency of the writer than RTF does. It is below the surface and invisible to the user.

However, many applications of XML are intended to capture semantic information. Such uses must impose themselves on the writer, because they are collecting information from the writer that the writer would not otherwise be recording.

Semantic markup takes upon itself part of the load for carrying the meaning of the text. The markup becomes, in effect, an extension of the natural language in which the piece is written. In some cases, the markup forms a gloss over the text, and sometimes the application of markup leads to a change in the text, transferring semantic meaning from the text to the markup. The writer's thought is now to be expressed not through the text alone but through a combination of text and markup.

One example of semantic markup is discussed in my previous CMS Watch article --What Does Your CMS Call This Guy? In the following example, markup is added to the text to identify that certain words and phrases are references to specific actors, directors, and movies:

<p><director name="Howard Hawkes">Hawkes'</director> final film is a lighthearted Western in the <movie>Rio Bravo</movie> mold, with <actor name="John Wayne">the Duke</actor> as an ex-Union colonel out to settle some old scores.</p>

The <actor>, <director>, and <actor> tags, along with their "name" attributes become part of the language in which the review is written. They capture information that, while it is known to the author, would not otherwise be made explicit. This makes the information accessible to dynamic processing and reuse.

The tagging approach shown here is specifically chosen to be easy for the author to use. Fluency is maintained by choosing tag names that are easy to remember and by asking the author for information that they already know and allowing them to supply that information in a straightforward manner. For instance, this markup may be used to generate links, but the authors are not required to enter complex link markup, or to interrupt their thought processes to choose link targets.

Semantic markup can be a very powerful tool, especially for the enterprise. It can allow you to automate the publishing process, generate metadata that makes information retrieval easier, and allow content to be reused in many different ways.

But as an intrusion into the language of composition, semantic markup has a direct impact on the fluency on the author. Nor is the difficulty removed by hiding the physical tags in a structured editor. It is not the angle brackets that impede fluency, but the intrusion of element and attribute names and complex content models into the author's language of composition. In semantic markup, these things must remain visible to the author no matter what style of authoring interface is used.

If you want to maintain and improve the productivity of authors and the quality of the work they produce you must be careful that the systems you provide for them do not damage their fluency. It will be a pyrrhic victory if you improve the speed of publishing and ease of retrieval but damage the quality of the content.

In Praise of Lucidity

The property that provides for the easy development of fluency is lucidity. How do we ensure the lucidity of markup languages? First, we must recognize a markup language for what it is: a user interface. Whether they type tags, input text via a browser, or use some sort of structured editor, writers interact with a tagging language as an interface for creating text. A tagging language is a user interface and as such, user interface design principles need to be applied to it.

The terms "markup language" and "tagging language", common in the SGML world, are less commonly used in discussing XML. Certainly not all XML schema can reasonably be called tagging languages. Many schema simply describe data structures or communication protocols. However, when it comes to schemas that support semantic markup of content, it is important to remember that these are indeed languages in which the content will be marked up, usually by the author of the content. Using the terms "tagging language" and "markup language" reminds us that here we are not simply defining data structures, we are in the business of describing actual languages that will be used by human beings to capture and describe information.

Many tagging languages are developed as elaborate logical models of document structure. They can run to hundreds of different tags that can be used in thousands of combinations. But if you relieve authors of worrying about complex formatting issues only to require them to worry about complex structural issues, you are no better off. Or at least, they are no better off -- you, as a content manager, may have successfully offloaded some of your issues onto their shoulders. But both writer and reader may suffer for your convenience.

For instance, you may determine that you can improve the speed and consistency of your corporate publishing function if all documents are created using the DocBook document type definition (DTD). DocBook can be used to drive a central publishing operation, but it is a complex DTD with hundreds of tags and some highly complex content models. Asking all the authors in your company to switch from word processors to DocBook might make your life easier, but it will make their lives much more difficult.

On the other hand, if you give authors highly lucid tagging languages designed just for them, you can make their lives easier and you can transform those individual lucid languages into a far more consistent DocBook than you would ever get from having people author directly in DocBook itself. (Yes, some word processors can export documents to DocBook, but only to a subset that matches the semantics of the word processor, and since this process imposes no discipline on the author, the results are no more consistent than you would get from traditional word processor formats.)

Transformation enables lucidity

The greatest virtue of XML is that it is easy to transform one XML tagging language into another. You may need an elaborate document structure language to drive your publishing process, but that doesn't mean that you have to force authors to create content directly in that language. You can create a lucid tagging language specifically to support the work of the authors, and then transform from that language into your document structure language.

The movie review language in the example above is a lucid language designed specifically to capture semantic information relevant to movie reviews in a way that movie reviewers can relate to. If you have a DocBook-based publishing system, it is easier to have movie reviews authored in the movie review language and then transform from that language to DocBook prior to publishing. As part of this transformation process, you can resolve the linking possibilities of the author, movie, and director tags in any one of a number of ways. You not only get ease of use for authors, you also get a wider range of publishing alternatives.

Designing a lucid tagging language

How do you create a lucid tagging language? Here are some basic design guidelines:

  1. Keep it simple. You are asking people to learn this language and use it as an extension of their natural language of composition. If it is simple, this learning will not be too difficult. If it gets complex it will quickly become a major impediment to fluency.
  2. Keep it small. Small is the heart of simple. Do not create more tags than you can absolutely get away with.
  3. Keep it flat. Documents may be hierarchical in structure, but understand that this is a problem, not a virtue. Hierarchies are difficult to track, and they generate complexity. Flatten every structure you can. Many hierarchies can be flattened into sequences with no real loss of information. You can always transform them back into hierarchies for publishing if you need to. For example, while it seems logical to make sections and subsections into hierarchical elements, and then make section titles part of the content model of a section, you can make life a lot easier for the author by simply giving them and <subhead> tag at the same structural level as your <p> tags. You can then transform this flat markup into hierarchical markup simply by collecting up the content between one subhead tag and the next subhead tag or other feature that would mark the end of a section.
  4. Keep it topical. Authors know their subjects. They may not know or care anything about your publishing software or your metadata schema. But they know the subjects they write about. Design your tagging language to capture data in the categories that make sense to authors based on the subjects they are writing about.

Don't be afraid to create several different topical tagging languages for capturing information about different subjects. You can roll them up to one common tagging language later if that is what your process requires. There is nothing more important in running any kind of information management system that to get good data from the start. Creating lucid tagging languages that improve the quality of your data capture will pay dividends all through your content management system.

Improving Your CMS Initiatives

Getting the cooperation of authors is one of the biggest challenges that content management leaders face, both at the local level and at the enterprise level. Authors are already enormously reluctant to abandon their familiar authoring tools and methods. You can greatly improve your chances of getting authors to cooperate with your content management initiatives if you demonstrate that you understand to the importance of protecting their fluency, and if you show that you are willing and able to create lucid interfaces that let authors create content without loss of fluency.


Next:

Send Feedback

See all ECM Suites Channel feature articles.

Need to select a technology vendor, but confused about your choices? See our vendor-neutral technology reports.

Join the conversation

Digg This! Search Technorati Tag it on Del.icio.us



About the Author

Mark Baker

Mark Baker is the principal of Analecta Communications, a communication consulting firm in Ottawa, Canada. His former positions include Manager of Information Engineering methods at Nortel and Director of Communications for OmniMark Technologies. Mark has written and spoken extensively on single sourcing and markup. He is co-author of HTML Unleashed, 2nd Edition and author of Internet Programming with OmniMark. He is currently writing a book on refactoring content.



Get a Free Sample

Wondering about CMS Watch research? Sign up to receive free samples of any of our products.



What we do

CMS Watch™ evaluates content-oriented technologies, publishing head-to-head comparative reviews of leading solutions. What makes us special?

  • Our critical analysis exposes product weaknesses as well as strengths
  • We deliver unrivaled technical depth and comprehensive project advice
  • Our research is led by international topic experts
  • We only work for buyers -- never for vendors

Contact us

CMS Watch

info@cmswatch.com

3470 Olney-Laytonsville Road Suite 131

Olney, MD USA 20832

1 800 325 6190 (customer service)

+1 617 763 5336 (int'l customer service)

Fax: +1 214 242 3048