Converging Content and Data
Love Your Local Data Warehouse Manager
by Tony Byrne
17-Apr-2008 --

For a while there, it seemed like everyone was talking about information "convergence" -- the idea that the data and content worlds were coming together, perhaps over a shared interest in search, text mining, records management, whatever. Examples of mashing up data and content abound, from retrieving customer comments in CRM systems, to data-enabling a SharePoint collaboration portal. Convergence even headlined an AIIM Expo keynote a year ago.
I obtained some new perspective on the difficult task of bridging the two worlds when Alan and I had the opportunity this March to lead some seminars at the Data Warehouse Institute World Conference. We were teaching about unstructured information (a.k.a., "content") management to "data" managers. The famous gulf between data people and content people sometimes becomes cliched, but I think it's real.
Convergence won't come in the guise of new technologies. It will come, first and foremost, through changing attitudes and more communication. In particular, I think content specialists -- webmasters, document manager, archivists, content analysts, et. al..-- could really help themselves by reaching across the divide to ally with data managers.
Content vs. Data
Of course, at some level, all electronic information is "data." But in this context I'm not referring to where and how enterprise information is persisted at the level of ones and zeros. We can all agree, for example, that sometimes you should organize at least text content (e.g., HTML) in a relational database of the type that stores your enterprise-critical data.
The bigger challenge comes in the broader division of information über-types. On the one side we have data -- facts and figures typically expressed in tables. On the other hand we have content, information in context, usually represented as documents. If you want to get fancy, you could point out that many documents are in fact "structured," though not always in a way that makes them conducive to tabular representation in a database.
Content Girl and Data Guy
Please indulge some stereotypes here. To personify the cultural and technical divide between the business analysts who define and drive our important information systems, let's bring out Content Analyst Girl and Data Analyst Guy. Seemingly, they only share a middle name.
Content Girl's defining professional experience was the rise of the Web, or perhaps an earlier installation of a Document and Records Management system like FileNet. She calls herself an "information architect," thinks in terms of folders and tagging, and works on ECM and WCM systems. She repeatedly has to explain and justify what she does, sometimes invoking the old canard about 80% of all information being unstructured. Her work is never done.
Data Guy may have come out of the mainframe modernization era or the client-server transition at his firm. He calls himself a data architect, thinks in terms of tables and data cubes, and works on data warehouses around the Business Intelligence (BI), Enterprise Resource Planning (ERP), and Customer Relationship Management (CRM) systems that form the life-blood of many large enterprises. He knows they're the life-blood. (His bosses know it too.) So he's careful with the data. Compared to Content Girl, he spends a lot of money to clean, merge, parse, and extract meaning from that data. His work is never done.
What Data Guy Can Offer
If you're Content Girl, Data Guy can potentially become your your ally, and vice-versa. Consider:
Data Guy is not afraid of large, enterprise projects. He'll likely confirm your intuition that it's worth picking and choosing your integration battles.
This is partly because Data Guy understands foundational scaling, migration, and information quality challenges, as well as associated stewardship and governance issues. This doesn't mean he's figured all that hard stuff out -- just that he may have more experience and generally more mature structures and processes to share here. (I could opine about the subtle differences between data stewardship and content governance, but let's leave that for another day.)
Like it or not, Data Guy gets asked by line managers to solve unstructured content problems too, particularly lightweight document management. Hopefully, he's been around the block enough to know that defaulting to SharePoint is a stop-gap at best.
When Data Guy thinks about "structured information," he tends to think rows and tables, rather than hierarchical structures that come naturally to most content people. This is an old stereotype, but I think it's still true. Folder metaphors can help educate here.
Data Guy will likely recognize the performance and security issues that bedevil enterprise search projects targeting unstructured documents because he runs into the same problems when collecting, securing, and querying data stores. What will surprise him, perhaps, is the sheer volume and accumulation rate of searchable documents.
Data Guy has to lead migrations all the time, not just the every-three-to-five-years we see in the content world. He can share advice (and tools!) about cleaning, normalizing, and moving information from one system to another -- although your crazy quilt patchwork of HTML code and Office files may well make him dizzy anyway.
Data Guy gets besieged by vendors trying to sell him on "portal" products that are really just dashboards on top of point solutions. As such he's a potentially very useful partner in understanding enterprise portals as true integration platforms, and can help you take your underwhelming intranet to the next level by exposing useful data services to everyday managers. (This might ultimately require true enterprise portal software.)
Data Guy knows all about metadata. Oh yes, he's someone you can geek out with about taxonomies and controlled vocabularies. In fact, you'll need to harmonize metadata first and foremost if you want to "converge" information within your enterprise.
Data Guy and Your Web Metrics
Data Guy can also be a good ally for your Web Analytics efforts. After all, we're talking mostly about data here.
He's increasingly interested to incorporate web metrics into broader enterprise performance management systems.
He shares your awe about the huge, recurring volumes of new traffic data your website generates.
He was initially suspicious of the heavy prevalence of hosted ("SaaS") web analytics services, but then came to see the value of SaaS suppliers. It's just that he's going to ask hard questions about data access APIs.
He's not intimidated by the problem of heterogeneous data sets residing in multiple locales (e.g., your webserver log files, web analytics database, e-mail marketing system, web application transaction logs, search engine marketing reports, and so on). Again, he won't make aggregation easy; it's just that he's seen this problem before and comes armed with tools and methodologies to address it.
Getting Together
Of course, Data Guy might not want to cooperate. So be it. However, the data people I've known over the years are getting increasingly interested in documents. It's not idle curiosity; their jobs are demanding it. Content Girl, you have quite a few things to teach him here, too.
You can reverse genders and say Data Girl and Content Guy -- it doesn't matter: the data/content divide becomes no less real. Yet I believe the gulf is largely cultural and partly semantic. At the end of the day, convergence will not come via technology, but through cooperation. You might not find any pressing business need to mix data and content, but you'll still end up all the smarter by looking. Even a furtive meeting to exchange metadata is a good start...


