XSLT 2.0
A Transforming Experience for Content Management?
by Steve Heckler
04-Mar-2004

During the past five years, XSLT (Extensible Style Sheet
Language Transformation) has emerged as the "Babelfish" of the XML world,
translating XML documents into other XML and text formats. XSLT combines a
versatile, tag-based scripting language with XPath, a powerful language for
selecting specific sections or data within XML documents. The advent of XSLT
2.0 could mean even more power and capability for XML-based content management
systems.
In XML-based CMS systems, XSLT provides the ability to select specific sections
of one or more XML documents, and then return this data in any desired format,
such as HTML, WML, PDF, or some other XML format for syndication. This is
important, because XML doesn't describe how content appears, bur rather
what it's about. You need XSLT to make your XML content
human-consumable and sharable with others' applications.
XSLT Basics.
As background, let's review some basics of XSLT. Mechanically, an XSL transformation relies on two ingredients (and an optional third ingredient):
- An XML document to transform
- AN XSL stylesheet describing how the transformation will take place
- Values for optional parameters that the stylesheet may contain
An XSLT engine (such as the ones built into many CMS systems, Java 2 Standard Edition, and the .NET Framework Class Library) then transforms the XML document via the XSL stylesheet, producing output in the format specified by the stylesheet.
XML + XSL
+(optional) Parameters = Output
As an example, let's say we want to transform the following XML document into HTML:
<?xml version="1.0"?>
<customers xmlns:xsi=
"http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="customers.xsd">
<cust gend="male" citizen="USA" nbuyer="false">
<customerid>1</customerid>
<name>
<firstname>Jorge</firstname>
<lastname>Garcia</lastname>
</name>
<vehiclepref flexible="true">3</vehiclepref>
</cust>
<cust gend="female" citizen="Canada" nbuyer="true">
<customerid>2</customerid>
<name>
<firstname>LaTonya</firstname>
<lastname>Campbell</lastname>
</name>
<vehiclepref flexible="false">2</vehiclepref>
</cust>
<cust gend="female" citizen="Niger" nbuyer="false">
<customerid>3</customerid>
<name>
<firstname>Ife</firstname>
<lastname>Ogoni</lastname>
</name>
<vehiclepref flexible="true">4</vehiclepref>
</cust>
</customers>
We would build the following stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-
instance">
<xsl:template match="/">
<html>
<head><title>Customers</title></head>
<body>
<div>
<table border="1">
<tr>
<th>Gender</th>
<th>Citizenship</th>
<th>New Buyer?</th>
<th>Cust. ID</th>
<th>First Name</th>
<th>Last Name</th>
<th>Vehicle Pref.</th>
</tr>
<xsl:for-each select="customers/cust">
<xsl:sort select="name/lastname" />
<tr>
<td>
<xsl:value-of select="@gend"/>
</td>
<td>
<xsl:value-of select="@citizen"/>
</td>
<td>
<xsl:value-of select="@nbuyer"/>
</td>
<td>
<xsl:value-of select="customerid"/>
</td>
<td>
<xsl:value-of select="name/firstname"/>
</td>
<td>
<xsl:value-of select="name/lastname"/>
</td>
<td>
<xsl:value-of select="vehiclepref"/>
</td>
</tr>
</xsl:for-each>
</table>
</div>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
It would generate the following HTML:

HTML output from
XSL transform of XML document
So What's Important About XSLT 2.0?
XSLT 2.0 is currently a "Working Draft," with the latest draft (as of this writing) dated November 12, 2003. It should achieve W3C final recommendation status this year. In conjunction with XPath 2.0, which is being developed on the same schedule, XSLT 2.0 promises to offer CMS developers and integrators a wide range of new capabilities, including:
- Grouping
- Functions
- Data type checking
- String handling and regular expressions
- Transclusion
- Temporary trees
Grouping
XSLT 1.0 allows the output of a transformation to be sorted, but does not support SQL-style grouping. With XSLT 2.0's grouping features, you can sort elements in a dataset into groups, then generate reports for those groups. For example, you could sort employees in an XML document by job title, then generate reports on average salaries, minimum and maximum length of service, etc.
In a CMS, this could reduce costly user "roundtripping" back to the server to re-sort and re-group data, and could enable the development of analytically richer tables.
Functions
XSLT 1.0, while powerful, lacked a feature available in virtually every other language: user-defined functions! XSLT 2.0 now allows you to define your own functions and call them from anywhere in your stylesheet. These functions can be passed parameters and return values, just like functions in any other language.
This eliminates one of the biggest "knocks" against XML-based CMS systems for developers, who today frequently have to resort to embedding cumbersome code snippets in XSL nodes and vice-versa.
These functions can also be reused across multiple stylesheets, given you the opportunity to reuse code that you've developed for one XSL transformation across multiple other transformations. Ultimately, this should shorten the process of writing stylesheets that generate output from your XML documents.
Data type checking
In writing your stylesheets, you will often need to make an operational conditional on the type of a piece of data in the XML document. Is it a string, a number, a date, a boolean value, or something else? XSLT 2.0 introduces support for the XML Schema data types and the ability to check whether a piece of data can be cast to a particular type.
This becomes especially important as CMS packages (not to mention MS Office 2003) increasingly align around Schemas.
String handling and regular expression support
String handling has improved immeasurably in XSLT 2.0 through the addition of several new functions:
- compare() allows you to compare two strings in ASCII order, returning -1 if the first string is greater, 0 if the strings are equal, and 1 if the second string is greater.
- replace() allows you to replace a sequence of strings (or all strings that conform to a specified pattern) with another string. This is enormously helpful for changing the values of data in an XML document prior to generating output from it.
- string-join() and tokenize() allow you to join and break apart delimited data, such as records read in from a comma-delimited text file.
Regular expressions, supported by most modern programming languages, allow you to check whether a string contains a specific pattern. For example, the following pattern would check for a validly formatted US Social Security Number (which consists of 3 digits, a dash, two digits, a dash, and four digits):
^d-d-d$
The ^ at the start of the expression prevents any text from appearing before the pattern and the $ at the end of the pattern prevents any text from appearing afterwards.
In XSLT 2.0, regular expressions can be used:
- To establish conditionals for operations to take place. e.g., a portion of the stylesheet processing a set of employee records might be written to only print out those employees whose last names begin with "Sm" and end with "th."
- To specify patterns that should be replaced with other text. e.g., you might have XML documents describing products that are sold by your firm but made by your suppliers. The suppliers might provide you with XML documents that have their name left in; your stylesheet could replace these firms' names with your firm's name "on the fly" anytime an XML document is requested.
- To extract specific pieces of text from a longer piece of text. e.g., in a Social Security Number (such as 595-12-3456), you might want to extract and operate separately on each of the three pieces of the numbers.
- To break up a piece of text into pieces, with splits between pieces occurring at the points where a certain text pattern occurs.
String handling and regular expressions find their way into almost any serious content management development project, where companies are frequently wanting to munge and recombine pieces of content. This is especially critical for initial content migration into a new CMS (one reason why most CMS developers keep a bevy of Perl scripts handy).
In recent interviews with CMS vendors, both Doug Domeny of Ektron and Vern Imrich of Percussion identified regular expression support as top items on their developers' wish lists. According to Percussion's Imrich, "Currently, we use a mix of Java and XSLT to do content filtering based on patterns. Once XSLT 2.0 is finalized, we anticipate being able to do all of this in the XSLT stylesheet, which will help make our applications more modular and allow us to implement new filtering features by writing or upgrading XSLT stylesheets, rather than Java code." He also noted that XSLT 2.0's regular expression-based "replace" feature should be especially handy for helping clean up XML generated from Microsoft Word.
Transclusion
XSLT 2.0 allows you to dynamically include data from other XML documents at runtime. In a content management system, this could provide an easy way to retrieve a standard header or footer for use in displaying the page, or to retrieve a piece of content that needs to be shared across documents.
Temporary Trees
XSLT 2.0 allows partial XML documents to be stored in variables and parameters (rather than just treating this data as plain text). In CMS systems, this and transclusion will be invaluable for facilitating the creating of compound documents "on the fly," based on data from different sources.
Currently, CMS vendors use proprietary extensions to XSLT to store these temporary trees. For example, Ektron currently uses the MSXML parser's NodeSet extension in some of its products. According to Doug Domeny of Ektron, "it will be great to have temporary tree capability built into XSLT 2.0, so that the stylesheets we write that use this feature are compatible with any XSLT platform."
Other Key Features of XSLT 2.0
- Better internationalization support, including international support for string sorting and comparison. Ektron's Domeny cited this as a key item on his wish list.
- Multiple document output capability, which will allow a single stylesheet to generate multiple outputs in multiple formats.
- Parsing of non-XML documents, such as HTML files, email messages, and delimited files.
Has Any CMS Vendor Implemented XSLT 2.0 Yet?
Due to the fact that the XSLT 2.0 specification is still in some flux, no CMS vendors that I spoke with have implemented XSLT 2.0. Once XSLT 2.0 achieves Final Recommendation status (hopefully later this year), several CMS vendors anticipate making swift use of the technology in their products.
In the interim, if you wish to hone your XSLT 2.0 skills, XSLT 2.0 is supported now by versions 7.8 and later of Saxon, a popular, Java-based XSLT and XQuery processor written by Michael Kay. You can obtain it at http://saxon.sourceforge.net/. XSLT 2.0 support on the .NET platform does not exist yet, but should emerge soon after the XSLT 2.0 spec achieves final recommendation status.
Resources
- http://www.xml.com/pub/au/42: Links to a set of superb articles on XSLT, including XSLT 2.0, by Bob DuCharme. This is a great resource for finding code examples for almost all of the new XSLT 2.0 features discussed in his article.
- http://www.w3.org/TR/xslt20/: The latest draft of the XSLT 2.0 specification.
- http://www.w3.org/TR/xpath20/: The latest draft of the specification for XPath 2.0, the XML pathing and query language used by XSLT 2.0.


