Page Views Are Dead! Long Live Page Views! (A Discussion of Page View Alternatives)

August 6, 2006 Dennis D. McDonald

Doomed Page Views

Dave Morgan’s article Doomed Page Views argues that measuring “page views,” given the rise of RSS and related feed technology in the online world, will be obsolete within two years as a way to measure the reach that a web publisher (and by, extension, the ads served by that publisher) have with respect to communicating with an audience.

This is a quote from Morgan’s post:

The demise of the page view means that we will need to focus on other measurements to determine the quantity of content that online media audiences consume. It will focus everyone in our business much more on audience, with the notion of unique audience becoming more important. Engagement–how deeply consumers interact with particular content and with particular ads–will become much more important. The actual results of the ads, whether it be generating leads or sales or requests for information, will become more important.

Morgan approaches this issue from the perspective of advertising and the need to measure advertising’s effectiveness. This appears — at first — to be different from a concern that I have expressed in the past that dependence on an RSS feed for access to the content of a web publisher has profound implications for how an author reaches an audience with recognizable content. We’ll return to this point.

Let’s first consider the concept of page views. In the above quote Morgan states, “… we will need to focus on other measurements to determine the quantity of content that online media audiences consume.”

Presumably he means that “page views” are currently a measure of “quantity of content consumed.” That is, the number of times that the loading of a page is associated with the capturing of a unique IP address over a given time period, the resulting count can be seen as a surrogate or proxy for the “quantity of content” that is “consumed” by someone using a device associated with that particular IP address.

Page View Deficiencies

The deficiencies of such a measure in accurately describing “media consumption” are well known. Some examples:

The same person might access the same page from different IP addresses or from different computers during a given time period.
There is no guarantee that a “page view count” actually corresponds to the user seeing a particular item on a page and/or having it register for purposes of taking an action or making a decision.
There is increasing variability in what constitutes a “page.” First there was the shift from static to dynamic pages, then to database-fed pages, now live content is streamed to local browsers where it is manipulated and displayed (and presumably consumed) outside the observational capability of page view counts.

Morgan now adds the impact that RSS feed technology is having on page view counts. If you regularly blog and scan incoming visit data you’ll see how frequent access to your blog via an RSS feed is.

In my case, my vendor (Squarespace) generates and counts hits from RSS, RDF, and Atom feeds. It tracks them separately from unique visitors (as counted through IP addresses that I have not blocked, such as my own) and from known indexers, robots, and crawlers.

Here are some related questions that occur to me when I’m reviewing my own blog “consumption” data based on feed supplied hits:

Usually I have no way of knowing what the RSS feed subscriber to my blog is seeing at the other end. Is the RSS feed user seeing a title, the first few sentences, an entire article with or without images and/or links, or what?
Is my name and a link back to my blog associated with the feed as seen at the other end? (I know in some cases it is not.)
Is an individual feed subscriber seeing my feed or is he/she seeing it as an aggregation of items built through a feed from a search engine that allows subscriptions based on keyword searches that periodically are performed against selected feeds?
How can I tell when someone clicks on a feed link then sees the whole article as displayed through my blog?
How often is what is seen in a feed substituting for the “original article”? (If I had ads on my blog — which I don’t — I would really care about this.)

The questions that Morgan raises have even greater commercial meaning given the relevance to advertising placement, pricing, and ad effectiveness.

Consider the difficulties associated with measuring TV ad viewership. Personal written household diaries generate different data on home TV watching from automatic recording of the displayed channel. Even measures of what channels are being displayed can’t count who has left the room for a bathroom break.

Measurement difficulties also exist with measures of page views, especially when these measures are filtered through RSS feeds and feed aggregation services. As pointed out by Morgan, page views provide an imperfect measure of what “content” is actually being “consumed.” Given continued adoption of RSS feed technology and increasing use of the Web for video distribution, this measurement situation is bound to deteriorate.

The Concept of “Engagement”

The key to how we should be thinking about this situation, I think, is related to the concept of “engagement” that Morgan mentions in the quotation above. An important question is, engagement by whom with what?

This gets us back to the issue of identity, both the identity of what is being distributed, and the identity of who is doing the receiving (or “consuming”). As we see with RSS feeds, there are many potential identity problems that can occur between the time a feed is generated and the time it is received and acted on by the user. Content can be lost, the identity of the creator (or owner) can become missing, and elements can be reassembled or displayed in ways unintended by the original creator. At the same time, the identity of the user can become lost as well, given the vagaries of relying upon IP addresses and other encoded machine or network details for information about the user.

There needs to be a more reliable way to track the engagement between the source and the user. This tracking should take into account that sources and users can become separated physically and from second to second. The fact that so many “non-pages” are now being transmitted and used (YouTube, anyone?) also means that measurement must handle multimedia as well as page-oriented textual information.

This Sounds Like DRM!

One possible solution to measurement sounds similar to how a digital rights management system (DRM) operates. Identity controls are established that cover both the source and target. Intervening network traffic monitoring incorporates a transaction measurement function that, at minimum, counts transactions and keeps track of counts by identities of source and user. Adding ownership and licensing management adds elements of a DRM system. (For simplicity’s sake I have not mentioned encryption/decryption.)

Please note: I’m not recommending DRM as a solution to the the problems pointed out by Morgan with using page views to measure online consumption. DRM is already controversial. The idea of doing a better job of tracking the identity of who is responsible for clicking an ad or downloading a page would most likely be a recipe for privacy controversies.

A Voluntary System

My preference would be for a voluntary system that, as a creator of content, I would be confident that whenever my content is distributed either via a push or a pull system that (a) it will be reconstituted at the receiving end in a manner that I find acceptable, and (b) however my content is distributed, it contains basic information that identifies me as the author, owner, or creator.

At the other end, when I as a user (or “consumer”) obtain access to and read, view, or listen to someone’s content, (a) I can be assured that the content is in fact what the author, publisher, or owner intended, and (b) I can voluntarily provide personal identity information that can identify the user to the sender.

Note that I used the word “voluntarily.” In a commercial environment, one way to encourage participation in a “voluntary” system is to provide a benefit (“pay”) to system participants for disclosing certain types of information, such as personal or behavioral information that market researchers could cross-tabulate with descriptions of content that is being accessed online.

Conclusion

Even if technical solutions are available to verifiably and reliably provide identity information for both content and user that can be monitored, the concept of voluntary provision of identity information, coupled with the option to pay people to reveal personal identifying information, would be controversial.

This would be a far cry from the freewheeling World Wide Web that we all know and love. We are asking the Internet to support much more than its inventors ever imagined, and it is showing the strain.

Perhaps the fact that the Web is unreliable in providing details of who uses what is not surprising; I am sure that some will applaud that fact.

On the other hand, I’m equally confident that commercial interests, such as the companies that hire consultants like me, will find business-compatible ways to overcome the deficiencies of the current page view model.

Acknowledgement

I would like to thank my Podcast Roundtable colleague Robyn Tippins for bringing the Morgan blog posting to my attention via one of her links.