• Home
  • Research
  • What We Offer
  • Who We Are
  • Blog
  • Your cart is empty.
  • Log in
  • Subscribe
  • Free Sample
  • Contact
  • Recent Entries
  • Get Custom Feeds
Team Blog
Free Research Sample

Do You Really Own Your Web Analytics Data?

By Tony Byrne and Phil Kemelor at 2009-03-16 16:59:00 |

Your website analytics solution generates a lot of data, potentially gigabytes a day if you run one or more busy sites. But who really owns all that rich data? It's a complex issue that often gets overlooked during web analytics vendor selection and contract negotiations. As more customers turn to SaaS-based solutions (where the vendor stores your traffic data) and as Google and Yahoo continue to broaden this marketplace with their free hosted analytics offerings, the question of data ownership becomes increasingly germane.

The latest Facebook kerfuffle brings these issues into greater relief. (To summarize the conflict: Facebook briefly granted itself explicit rights to maintain certain pieces of member data after a member departs.) Last month, Facebook clarified its stance: "Our philosophy is that people own their information," stated CEO Mark Zuckerberg. Members are just licensing that data to Facebook to use. On the plus side, this episode got people looking more closely at Terms of Service (ToS) documents. It turns out Facebook retains significant rights to your data.

Unfortunately many analysts and web managers we encounter at large enterprises either don't read or don't have access to their vendor service terms, and generally don't ask about data ownership during the vendor evaluation process. Most web analytics customers just assume that -- as with their individual Facebook information -- they fully own their web analytics data and are just granting a limited license to the vendor to generate reports. Depending on what "full ownership" means to you, that may not be totally true.

Comparing Four Vendors

Hosted web analytics vendors are providing you a service by ingesting, aggregating, and parsing your traffic data, which they accumulate on your behalf. As part of our research for the Web Analytics Report, we examined agreements from Google and Yahoo as well as contracts submitted privately from customers of two traditional fee-based web analytics providers -- call them Vendor X and Vendor Y..

The two fee-based vendors explicitly confirm your ownership of the data.

Here's Vendor X's terms:

    "As between [Vendor X] and Customer, Customer exclusively owns all rights, title and interest in and to all Customer Data. Customer Data is deemed Confidential Information..."

And Vendor Y too:

    "Customer Data, other Customer Confidential Information and any other Customer information and materials, and all worldwide intellectual property rights in the foregoing, are your exclusive property. "

And what about the free services?. Google's ToS (to their credit, publicly available online) also refers to the collected data as "customer data." But the agreement doesn't clarify whether "customer" in this context means you, and not your website visitors. Yahoo's terms are mute on the topic of data ownership.

What is Ownership?

Yet what does data ownership really mean?

This legal dictionary defines ownership, alternately, as "Legal title coupled with exclusive legal right to possession," or "The right by which a thing belongs to some one in particular, to the exclusion of all other persons." When a SaaS vendor says you own your data, what rights are you still conveying and withholding?

Within the enterprise, most companies have this sorted out pretty clearly: all data belongs to "the employer." In fact, the fear of conferring ownership to an employee or any other entity spawned the phrase "data stewardship," rather than "data ownership." You may work with enterprise data for eight hours every day, but you are only the steward of that data; your employer actually owns it.

We think that by breaking ownership down into more specific dimensions you can get a better sense for what truly matters for your enterprise:

  • Data usage
  • Data security and availability
  • Data retention and disposition
  • Data access

Let's look at each aspect in turn.

Use of Your Data

Both Google and Yahoo have built powerful platforms, but as Scott Karp put it, "The real value is in the data." In exchange for the free service, they both give themselves expansive usage rights to your data.

Here's what Yahoo says:

    "As a condition of using Analytics, you will: (i) obtain on behalf of the Yahoo! Entities all rights and permissions necessary for the Yahoo! Entities to use the Analytics data, including statistical and traffic information collected by us and/or provided by you..."

Yahoo mandates that you put strict notice of this in your own website's privacy statement, including this clause:

    "...(iii) a statement that expressly identifies Yahoo! and its use of the Analytics data to improve Yahoo!'s products and services and to provide advertisements about goods and services that may be of interest to end users..."

Your privacy statement must also link visitors to an Analytics opt-out form.

Google is equally vague:

    "Google and its wholly owned subsidiaries may retain and use, subject to the terms of its Privacy Policy (....), information collected in Your use of the Service."

But Google does at least severely restrict 3rd-party access to the data.

Google and many fee-based analytics vendors will privately combine your data with that of other customers for benchmarking or industry-average information -- and then share those reports with you -- but you can typically opt out of this.

Accumulating data across customers relates to the retention and disposition topic below. If you leave the service, what data usage rights does the vendor retain? Recall this is what got Facebook into trouble. Some analytics vendors may purge your raw data but still keep your aggregate information to inform their benchmarking warehouse.

Data Security and Availability

All of the largest web analytics vendors go to great lengths to ensure the safety of your data from the perspective of unauthorized access. You may get careless with passwords, but that's your problem. Of course to the extent that most data thefts are inside jobs, those vendors with more fine-grained access controls (hint: Google not among them) may provide a greater degree of safety in this regard.

As with security, most (though not all) web analytics vendors invest in back-up, redundancy, and failover systems for optimal availability. Yet if there's a data loss, you're on your own. In cases where you don't have access to the raw data, you may never even know about a blip unless the roll-up reports tip you off. All agreements that we've seen absolve the vendors of any liability here. So, in this case, what does data ownership really do for you? Not much, unless you have access rights and retrieve it regularly. More about that, below.

To be sure, we don't know of any instances of major data loss in this marketplace. (We've seen services outages, but that's a different issue.) It's worth noting, though, that just such a nightmare recently befell users of the Ma.gnolia bookmarking service (a competitor to Del.icio.us) last month. A critical failure to the main and back-up data store wiped out everyone's bookmarks. Ma.gnolia was not a large, commercial vendor, but the fact remains that sometimes the cloud can fail you.

Vendor X's terms and conditions in this regard seem instructive:

    "[Vendor X] cannot guarantee that any Customer Data Customer stores or transmits through the Service will not be subject to unauthorized access by others or that others will not gain access to the Service. [Vendor X] performs regular system-wide back up procedures for the Service, however Customer understands that there is an inherent risk in electronic storage and agrees to rely solely on its own back up copies of any Customer Data stored in or transmitted through the Service should the Customer Data become lost or damaged for any reason. At no time and for no reason will [Vendor X] be responsible for recovering or retrieving any Customer Data stored and/or transmitted by Customer using the Service unless such recovery or retrieval results from an event or occurrence that requires a Service-wide restoration (which shall be at [Vendor X's] sole determination.)"

Data Retention & Disposition

There are basically two types of web analytics data you can retain: data that is used in creating summary tables that form the basis of the fancy reports you retrieve, and unaggregated source data -- complete records of all captured activity for each individual visit. Think of this distinction as summaries versus raw data.

Vendors will typically retain these two types of data for different periods of time. Raw data gets unwieldy quickly, so they may keep it for only a brief period (long enough to aggregate it) and commit contractually only to formal retention periods for your aggregate data.

Most vendors start with default retention terms. For example, they may grant one month for raw data, three months for aggregate summaries. If you want more than the default, you have to pay for it.

Consider WebTrend's standard retention schedule. By default the hosted service retains report data for four months, unless you upgrade to "Extended Data Retention" for thirteen months duration. Raw data gets stored for 14 days.

As e-discovery specialists know, retention could theoretically matter for legal or regulatory reasons. If you wanted to go back and prove or disprove that a visitor came to your site and accessed certain pages on a particular date and time, you might need to review the raw traffic data. (That data may still prove inconclusive regarding an individual visitor session, but that's another story.)

Then you also have to consider the issue of disposition. What happens to your raw data, let alone your aggregate data, when you leave a service? If the vendor says they've deleted it, is it really gone? Facebook got into hot water for the vague way that it dealt with this issue.

But the more important issue here is that you should not take perpetual ownership for granted. Unless you negotiate otherwise, at some point you no longer own your data (raw or unaggregated) -- because it will no longer exist.

Data access

Most web analytics vendors counter that you can export your data at any time. However, not every vendor lets you export your raw data, and not always all of it at one time. You might have noticed in the WebTrends terms above that access to raw data costs extra, and carries certain limitations.

Given the size of these datasets, you can understand some of those limitations. Sometimes vendors will request that you perform large data exports only at certain times, and only in a certain format (like CSV files) that may or may not prove convenient. And of course, you need to figure out how to store all that imported data yourself, but then at least you know you have it.

Many customers need more regular access to their data. Enterprises are increasingly looking to integrate online data with offline information. Or they may want to run custom queries, perhaps against raw data, that can't be assembled using the vendor's report-building services. Larger vendors have responded by offering "data warehouse" functionality -- a fancy term that will lead to some suitably expansive fees. The idea here is that the vendor offers an API to the underlying raw data which you can use to get programmatic access. That is, you can run queries, and regularly extract just the data you need. Here again, larger data extractions may queue up at a vendor and their execution could get measured in days.

But the larger point is this: if "owning" your data means ad-hoc access to arbitrary slices of it on an ongoing basis, you'll almost surely have to pay extra for that privilege.

What you should do

So, if you go with a hosted web analytics service, you'll want to figure out what elements of data ownership are most relevant to your enterprise. If you run an SMB site where Google provides decent analytics services for free, then you may be quite willing to forego some traditional privileges of ownership. For larger enterprises, making sure that you can get your hands on your data when you need it and with as little interference as possible seems like a prime consideration.

Above all, you should clarify contractually that your web analytics vendor is simply a steward of your data for the purposes of ingest, mining, and reporting, for however long you two agree for them to play that role. You may grant them additional rights, but at the end of the day, the data should be yours.

  • Tweet This Entry

Online Education

Check out our classes and Register Today.

Evaluation Research

Get the real story about vendors and products.

Get the Rest of the Story

Enterprise Information Watch

Enterprise Information Watch

Evaluating enterprise content technologies, including ECM, Search, DAM, and Portals.Learn More

SharePoint Watch

SharePoint Watch

Helping you evaluate and optimize SharePoint technologies for the enterprise. Learn More

My Research

Remember MeForgot password?

Not a subscriber? Learn about our subscriptions

Have Questions?

Sales & Customer Support

+1 800 325 6190 (USA)+44 (0) 20 3318 1911 (UK)+1 617 340 6464 (Int'l)sales@realstorygroup.com support@realstorygroup.com

All other inquiries: info@realstorygroup.com

Copyright, 2001 - 2010, Real Story Group. All rights reserved.

  • Contact Us
  • Copyright Policy
  • Privacy Policy
  • Terms of Use

The Real Story Group

  • CMS Watch
  • Enterprise Information
       Watch
  • SharePoint Watch
  • The Real Story Group

Research

  • Vendor Evaluations
  • Webinars & Advisory Papers
  • Online Education
  • Vendor Lists
  • Free Research Sample
  • Purchase Now

What We Offer

  • Research & Advisory
       Services
  • Frequently Asked Questions
  • Consulting Services
  • Customer Support
  • Contact Sales Team

Who We Are

  • We're Different
  • Our Team
  • Media
  • Customer List
  • Events
  • Contact Us

Get the real story via our bi-weekly newsletter.

Follow us on: RSS twitter

Log In

Remember MeForgot password?