SEO and Your CMS
Embedding SEO Best Practices in CMS Implementations
by Randy Woods and Julie Batten
20-Aug-2006

Because web content management solutions provide centralized control over web site layout and presentation, they can be used to enforce many search engine optimization ("SEO") best practices. A CMS implementation that is SEO-aware can improve the ease of indexing and likelihood of better ranking for your content. Ideally, your CMS will also ensure that new content being added to the site appears in a search engine friendly format.
The following outlines ways in which a CMS can be configured to embed widely-recognized SEO tactics. We have categorized these tactics into groups that correspond with the capabilities of most content management systems, including:
- System-wide considerations
- Template-level considerations
- Page-level factors.
Let's look at each category in turn.
System-Wide Considerations
Enforce W3C Compliant Code
It is widely recognized that standards-compliant code is treated more favorably by major search engines. Most mature CMS solutions allow you to define and "lock down" the code used in templates, prohibiting content contributors from making deep changes to page layouts. However, support for pure CSS-based layout varies from system to system, and some packages with visual template builders still default to table-based layout.
By validating these templates prior to deployment you can ensure that the layout itself will be W3C compliant (text content is another matter -- as we'll see below). Some CMS solutions incorporate an optional code validation utility. Third party validation is also available, for example from Watchfire.com.
Create Site Maps
Search engines love crawlable site maps. Virtually all CMS solutions allow automated generation and updating of site maps. Current SEO practice suggests restricting the number of links on any one page to fewer than 100. Of course, this may change in the future. So the point is, your CMS should allow you create of a series of hierarchical site maps as necessary to provide spiders with quick access to all site content.
Separately, Google and Yahoo! now allow site managers to submit special, XML-formatted sitemaps to facilitate accurate crawling. To the best of our knowledge, no mature content management solution natively creates Google and Yahoo! site maps, However, a reasonably skilled developer should be able to extend any CMS with an open content model to produce these maps in the required format. The key thing is to trap content updates to make sure that specialized site maps are updated accordingly.
Mandate Search Engine Friendly URLs
There are two major models for content management solution architectures:
- Static Publication. In this model, content management is separated from content delivery. CMS activities -- authoring, editing, workflow -- take place on one server (usually inside the firewall). The CMS publishes static HTML files from its repository and deploys them to a separate web server that issues these static HTML pages to visitors. These systems do not natively add any dynamic variables (unless you add them separately), and typically allow very tight editorial control over directory and file names.
- Dynamic Publication. Dynamic CMS systems assemble content "on the fly" to create a page as it is requested by a visitor. Content management and delivery activities take place within the same system. These systems often generate long URLs littered with dynamic variables. Some of these systems provide work-arounds for this challenge by allowing URL aliases that appear as standard URLs.
We'll spend a lot of time here because there are several different considerations.
Directory structure may matter. There is some evidence that the "deeper" in your site a page resides, the lower the importance a search engine will place on the page. On the other hand, some usability specialists indicate that semantic directory structures help visitors "wayfind" through your site. In any case, a mature CMS should allow you to organize content hierarchically within the CMS independent of its physical location on the web server. With these systems you can maintain a much more complex system of organization within the CMS than is apparent in the directory structure of the live web site, thereby publishing to a shallower directory structure if that helps your rankings.
A related factor is the extent to which your existing URL structures can be maintained in your new system. Many CMS vendors underestimate the search engine rankings that their customers have built up over time. Sudden changes to longstanding URLs can reduce rankings in some cases.
To the extent that any aliasing involves redirects, make sure they are permanent (301) redirects for the other URLs. The major search engines do not penalize permanent redirects, but they will penalize other types of redirects, and in particular, you want to avoid systems that create two separate instances of a page that have identical content but are addressable from different URLs. Duplicate content risks the appearance of "spamming" the search index.
Dynamic URLs in and of themselves probably do not reduce your rankings, though some SEO specialists argue for keeping variables in the URL string to a minimum. In all cases, though, you want to avoid session variables. Session variables are most frequently employed as an alternative to cookies. They "hold state" allowing the system to track one visitor throughout a visit. If your CMS is assigning session variables you have one safe option and one potentially risky approach:
- Investigate replacing session variables with cookies as a means of
holding state. Many CMS packages provide this as a configurable option.
Be aware that a considerable and growing percentage of visitors refuse cookies. As a result, this approach may not be viable for systems that depend heavily on holding state (such as some shopping cart systems). In that case, determine whether the use of session variables can be restricted to those parts of the site that absolutely require holding state. - In the risky approach, you custom configure the system to identify in bound search spiders (by their IP Address or User Agent) and consistently assign the same session variable. This overcomes the session variable challenge by ensuring the search engines recognize the persistence of a page over time. It is risky because any time you treat a search engine spider differently than human visitors you open yourself to accusations of "cloaking," a decidedly black hat SEO tactic, risking a permanent ban from listings.
Best of all is to avoid content management systems that natively insert session variables.
Eliminate Broken Links
Broken links can reduce the authority and therefore potentially the ranking of your content in search engine indexes. Many content management solutions manage internal links as discrete objects and will not publish content that contains invalid links to other content maintained by the system, or simply strip links that are no longer valid. At the same time, these internal validations can also reduce publishing performance and interrupt workflows. In those events, you may wish to regularly employ a separate link-checker, which conveniently may also be able to validate external links.
Use Robots.txt Appropriately
Several CMS solutions allow end users to control the robots.txt on a page-by-page basis, but most leave its definition to the site developer.
Having a robots.txt file is not absolutely mandatory -- its absence is most notable for generating 404 errors in server logs and messing up web log analysis tools. If you choose to implement the file, however, it is critical that it is valid and accurately defines access to site content. A mistake in implementation can prevent the major search engines from indexing some of your site content.
In any case, you can register the robots.txt file as an asset in your CMS and then prohibit unauthorized staff from making alterations to this file without approval. One approach is to define a workflow for the robots.txt file that requires review by the system administrator.
Template-Level Considerations
Reduce Code Clutter
Increasing the clarity and prominence of the text on a page is one of the simplest, most-effective SEO tactics.
Most mature content management solutions provide the choice of using cascading style sheets (CSS) to control the format of a page. Making use of a CSS based design -- and then prohibiting the modification of this design -- will eliminate much of the HTML code that would otherwise be required.
If you chose to use JavaScript, ensure that the code is incorporated into the site as an include file rather than in the body of the page. By defining this at the template level, you can ensure the tactic is deployed throughout the site.
Create Site Navigation as Descriptive Text
When creating the design templates for your site, ensure that the navigation elements -- those links that structure access to the site -- are not images or Macromedia Flash objects, but simple text links. Enforce this navigation through the use of the CMS template subsystem.
Many content management solutions allow you to automatically generate "breadcrumbs" that indicate to visitors where they are in the site. Use this feature to generate a consistent keyword-rich text element on each page.
Ensure Links Can Be Processed By Spiders
When creating the templates, do not embed URLs in JavaScript or Flash. Some CMS packages ship with dynamic navigation modules that are JavaScript-based. Avoid these.
Cascading Style Sheets can be used to re-create many of the effects developers have traditionally depended upon JavaScript to perform. With careful development, you can maintain standard text links while benefiting from roll-overs, and fly-out menus and sub menus.
Page-Level Factors
Create Effective Title Tags / Use of Keywords in URLs
You probably know by now that title tags are one of the most important considerations in ranking. Ensuring consistent use of title tags, meta descriptions and keywords in URLs may require modifications to the editorial screens of your CMS. Of course, titles also matter for human readability and not just search results, but an enterprise that is really focused on the former would modify its CMS and editorial processes such that:
- Authors would be forced to enter three key search phrases relevant to the document being created before the document is created. They would be instructed to enter these terms in the order of their importance as search terms.
- The CMS would automatically include these three phrases as the first terms in the title tag of the page.
- The CMS would automatically save the page with a filename that includes the first search term.
Not all CMS systems can do this, particularly #3. Developer customization is usually required.
As an alternative, you can provide detailed instructions to content authors within the editing framework -- assuming that your CMS allows you to easily modify the editorial UI. (If not, perhaps you can find a hack, for example: Substitute a graphic containing instructions for an existing image in the editor.) Authors should be instructed to select search terms, use these in naming the file and include them in the title tag.
Above all, you want to prohibit duplicate titles. Adopting the approach recommended above for automatically generating title tags from author-specified search phrases should reduce the likelihood of duplicate titles.
If duplicate titles are a significant concern -- and your CMS does not include a sufficiently sophisticated dupe checker -- consider this work around: automatically include some other reasonably unique variable (e.g., publication date and time) in the title. This can ensure that no two titles will be exactly the same. It is far from an ideal solution but it is one that most content management solutions will support.
Encourage Effective Meta Tags
While search engines no longer appear to use description metatags in page ranking, they will often employ them in search results (as should your site search engine!), so they are an integral part of any SEO strategy.
Mature content management solutions allow you to require authors to define meta tags including the description tag. Use the CMS to require meta description entry. If possible, insert instructions in the editorial UI to help authors include a compelling call to action as part of the description tag.
Mark Up Content Effectively
Most mature content management solutions allow you to specify the controls available to authors within the tool's rich-text (WYSIWYG) editor. Customizing these controls -- and instructing users carefully in their proscribed use -- is critical to SEO effectiveness.
In particular, you'll want to enable <h1>, <h2> and <h3> headers in the rich-text editor. Then use cascading style sheets to control how the header tags appear to visitors. Disable the font size control. This will force contributors to make use of header tags to control text size.
Enforce Image Alt Attributes
Search engines index image Alt tags. Most mature content management solutions can enforce the definition of Alt attributes for images, either by the author during page creation or in the digital asset library during image upload.
If possible, incorporate instructions into the CMS interface that instructs authors to provide detailed descriptions of images suitable for indexing by search engines.
Avoid Spelling Errors
Most content management solutions incorporate a spell checking feature, either as part of the core product (typically within their rich text editor) or as an optional module.
If your CMS incorporates a spell checker, make its use is mandatory before authors submit documents to workflow. If your system does not include a spell checking feature, consider incorporating one of the open source spell checking products now available (perform a search at http://www.sourceforge.net to identify an appropriate solution).
Prohibit Duplicate Content
Major search engines will penalize duplicate content. Avoiding duplicate content is important, but also difficult. Authors will sometimes use existing content as a template, make only minor modifications, and then publish this as a new page. (Even more difficult to detect, competitors or well-meaning partners may "borrow" content from your site, but that's another issue.)
To the best of our knowledge, no widely-available CMS offers duplicate content detection as an out of the box feature. If your CMS is PHP-based, consider integrating Duplicate Check (http://www.duplicatecheck.com) as part of the work flow process.
So here, like in so many SEO areas, the key is adequately sensitizing authors to the ranking implications of their work.
In any case, good luck! Hope to see you in Google.
This article is adapted from the non-linear creations white paper SEO and CMS: Implementing Best Practices.