Author Archive

Opening Up Our Measurement Process

October 20th, 2014 by Josh

If you’re reading this post, you’ve likely already read our recent announcement about gaining accreditation from the Media Ratings Council for measuring a number of metrics, including viewability and active exposure time. You may have also read the Advertising Age feature on the Financial Times plan to trade ads upon those metrics — which gives a view into why we’re so excited about the possibility of people valuing ad inventory based on the amount of time audiences spend with it.

We went through the process of accreditation to bring us one step closer to a world where ads can be valued by the duration for which they’re seen. But the thing is, trading on time only works if
  1. Both buyers and sellers agree that time is a valuable unit to trade on
  2. There’s a unified standard for how time should be measured
  3. Both parties have access to the data and infrastructure on which to trade.
We think the case for transacting on time is clear and compelling: time in front of eyeballs is closer to what advertisers are hoping to buy than impressions (viewable or not) are, and — as one example — Jason Kint made a compelling case of why time-based valuations benefits publishers as well. On points (2) and (3), though, we think there’s still a long way to go.

On the measurement side, it’s critically important that — at least while there’s no IAB standard for time-based measurement — measurers be completely transparent about their exact methodologies so that buyers and sellers can understand exactly what the numbers they’re looking at mean. And, on the product side, even with an expanding set of products and a growing customer base, there’s simply more to be built than we ever will ourselves, and we think the industry will benefit from as many companies being involved as possible.

To address both of those points, I’m excited to announce today that we’re publicly releasing our Description of Methodology.

This is the main document on which our accreditation is based — it details the exact process of measurement that we use. Insofar as we have any “secret sauce” in our measurements, it’s in that document. Our goal in releasing it is twofold:
  • We think we all will benefit from others’ careful analysis, critique, and improvement upon the techniques we’ve proposed. Our hope is that others will adopt and refine our measurement process, and that the ability of all parties to accurately measure user engagement will improve over time.
  • The entire industry benefits from more people thinking carefully about their numbers, and we want it to be easier for other companies to gain accreditation. When we began our accreditation process, our largest hurdle was simply the fact that we didn’t have good examples of what successfully audited companies’ practices looked like. Our hope is that reading our document helps others down the line make it through in shorter order.
Having spent several years refining our process, there are a few hard-fought bits of knowledge that I wanted to highlight.

Measuring Engagement

Our method of tracking engagement has been derived from a set of human-subject experiments, and comes down to a simple rule: at each second, our code makes a determination of whether or not the reader is actively engaged with the page and keeps a running counter of that engagement.

In determining engagement, our code asks several questions:
  1. Is this browser tab the active, in-focus tab? If not, the user is certainly not actively engaged. If so, continue on to (2).
  2. Has the reader made any sort of console interaction (mousemove, key stroke, etc) in the last 5 seconds? If not, the user is not actively engaged. If so, consider them actively engaged and give one second of credit toward the visitor’s engaged time on the page.
  3. For each ad unit on the page: If conditions (1) and (2) have been met, is this ad unit viewable under the IAB viewability standard? If no, the ad has not been actively viewed this second. If so, give the ad one second of credit toward its active exposure time.
The five second rule is an approximation (it’s easy to construct cases where a person, say, stares at a screen for 1 minute without touching their computer), but we believe we’ve collected compelling evidence that it correctly measures the vast majority of time spent, and it provides a good measurement for guaranteed time spent — it’s difficult to construct any scenario in which a visitor is measured as active by our system but isn’t actually looking at the page. It also meets the IAB standard for being a clear, consistently applied, evidence-based standard (contrasted with, for instance, a different approach in which per-user engagement models were built). It’s also worth noting that others have independently arrived at similar methodologies for measuring active engagement.

It's important to note, though, we made a few mistake early on that others might want to avoid:
  • Measuring engagement as a binary variable across a ping interval: For the most part, our code pings measurements back to our servers every 15 seconds (more on that later). An early attempt at measuring engaged time, recorded a visitor as either engaged for 15 seconds (if they’re interacted with the site at all since the last ping) or 0 seconds (if not). That gives undue credit if, for instance, a visitor engages just before a ping goes out.
  • Tracking too many interactions (especially on mobile): When we initially began tracking mobile sites, we thought we’d listen to every possible user interaction event to ensure we didn’t miss engagement. We quickly had to change course, though, after hearing customer complaints that our event listeners were adversely affecting site interactions. There’s a balance between tracking every possible event and ensuring the performance of clients' sites.
  • Not correcting for prerender/prefetch: Before the MRC, we’d never seriously considered correcting for prerendering (browsers such as Chrome can render pages before you actually visit them to improve page load time). Having done it: in short, you need to do this if you want to count at all correctly.
  • Not considering edge cases: Does your code correctly measure if a person has two monitors? If the height of the browser is less than the height of an ad? If the person is using IE 6? If a virtual page change occurs or an ad refreshes, are you attributing measurements to the right entity?
Because devices, events available to our JavaScript, and the patterns of consumption on the internet change with time, we revisit this measurement methodology annually. If you’re interested in contributing, get in touch: josh@chartbeat.com

Ping Timing

Measurements are taken by our JavaScript every second, but it would cause undue load on users’ browsers if we were to send this measurement data back every second. In that sense, there’s a balance we need to strike between accurate measurement and being good citizens of the internet. Here’s the balance we struck:
    1. When a visitor is active on a page, we ping data back every 15 seconds of wall clock time.
    2. If they go idle, by definition our numbers (except raw time on page) aren’t updating, so there’s no need to ping as frequently. We use an exponential backoff — pinging every 30 seconds, then 60 seconds, then two minutes, etc — and immediately bounce back to our 15 second timing if a visitor reengages with the page.
    3. When a qualifying event occurs, we ping that data immediately to ensure that we record all instances of these events. Currently qualifying events are data about a new ad impression being served coming in and the IAB viewability standard being met for an ad.
    4. When a user leaves the page, we take a set of special steps to attempt to record the final bit of engagement between their last ping and the current time (a gap of up to 14 seconds). As the visitor leaves, we write the final state of the browsing session to localstorage. If the visitor goes on to visit another page on the same site and our JavaScript finds data about a previous session in localstorage, it pings that final data along.
    5. We also hook into the onbeforeunload event, which is called when a visitor exits a page, and send a “hail Mary” ping as the person exits — because the page is unloading there’s no guarantee that the ping request will complete, but it’s a best effort attempt to get the visitor’s final state across in the case that they never visit another page on the site (in which case the method described in (4) isn’t able to help).
If you found any of this useful, insightful, or flat-out incorrect, we’d love to be in touch. Feel free to reach out to me directly: Twitter @joshuadchwartz, email josh@chartbeat.com

Traffic During the Facebook Outage

August 4th, 2014 by Josh

As you’ve all but certainly heard, Facebook had a major outage midday on Friday. Overall traffic on news sites dropped by 3%, thousands took to Twitter to voice their frustration, and, apparently, a select few called the LA Sheriff's Department. Most interestingly for us, the Facebook outage provided a natural experiment to look at what the world of web traffic looks like without Facebook. Here, I’ll delve into two issues that are particularly interesting to look at through the lens of the outage.

Facebook and dark social

So-called “dark social” traffic — traffic to articles that lacks a referrer because it comes via HTTPS or apps — is subject to endless speculation. What portion of it comes from emailed links? From links sent via instant messaging? From standard social sources like Facebook and Twitter but with the referrer obscured? From search sites that use HTTPS? By virtue of the fact that no explicit referrer is sent, it’s impossible to tell for sure. Since Facebook makes up a huge portion of non-dark traffic, one might guess that a big chunk of dark traffic is actually Facebook traffic in disguise.

Of course, during the outage virtually all Facebook traffic was stopped, so we can use that data to ask how much dark traffic was definitely not coming from Facebook. The answer? Very little of it was coming from Facebook directly. Take a look at the graph below.

traffic-fb-ds

Facebook referrals dropped by almost 70% during the outage (note that traffic didn’t drop to 0, presumably because some number of people had Facebook pages open before the outage). There’s certainly a drop in dark social, but it's not nearly as stark, and dark social traffic just before the outage was only 11% higher than at its low point during the outage. Since 70% of Facebook traffic dropped off, that would imply that at most 16% (11% / 70%) of traffic could’ve been directly attributable to Facebook.

Now, of course, we’d expect some other social sharing might be negatively impacted — if people aren’t discovering articles on Facebook, they might not be sharing them in other ways. So, that doesn’t mean that 16% of dark social traffic is from Facebook, but it does provide strong evidence that 84% of dark social traffic is something other than Facebook traffic in disguise.

Where people go in an outage

As I discussed in my last post, a huge percentage of mobile traffic comes from Facebook. Given that, we’d probably expect mobile traffic to be hardest hit during the outage. And, indeed, entrances to sites on mobile devices were down 8.5%, when comparing the minute before the outage to the lowest point while Facebook was down.

Interestingly, though, we see the opposite effect on desktops: a 3.5% overall increase in desktop traffic after the beginning of the outage. That increase was largely fueled by a 9% increase in homepage direct traffic on sites with loyal homepage followings. We saw no increases in traffic via other referrers, including Twitter and Google News, during the outage. While we certainly can't claim that the outage was the cause of that uptick in desktop traffic, the timing is certainly notable.

traffic-desktop-direct

In short, then: our brief world without Facebook looked a bit different, albeit in predictable ways. Significantly less news was consumed on phones, slightly more homepages were visited on desktops, and 30 minutes later, when Facebook came back online, traffic returned to normal.

The Homepage, Social, and the Rise of Mobile

July 28th, 2014 by Josh

In the much-circulated New York Times Innovation Report, perhaps the most discussed graph was this one, showing a roughly 40% decline in homepage audience over the past three years.

nytimes-innovation-homepage

With that graph, innumerable articles announcing the “death of the homepage” were written, in The Atlantic, Poynter, and on numerous blogs. Most hinged on the relationship between the rise of social traffic and the decrease in homepage traffic. One thing that isn’t mentioned in most of these articles, though, is that the rise in social traffic was contemporaneous with a rise in mobile traffic, and that mobile is as much a principal part of the story as social is. Here, I’d like to explore the three-way interaction between mobile traffic, social, and homepage visitation.

Social traffic and mobile devices

The importance of social sharing on mobile devices is much discussed. (Take for example, the recent ShareThis report, which reported that 63% of Twitter activity and 44% of Facebook activity happens on mobile.) People aren’t just using social media on mobile to share articles, of course, they’re also clicking to those articles. Below, we break down the share of traffic coming from Facebook and Twitter by device across a random sample of our sites. (Note: We specifically chose sites without separate mobile sites and without mobile apps, to ensure that we’re making fair comparisons across devices.)

traffic-device

Facebook’s share of overall mobile referrals is nearly 2.7x larger than its share on desktop. Twitter’s share is 2.5x larger on mobile than on desktop. And, if anything, those numbers likely undercount the significance of social referrals, since many apps don’t forward referrer information and get thrown into the bucket of “dark social.” In some sense, then, it’s fair to say that—for most sites—mobile traffic more-or-less is social traffic.

Mobile and homepage traffic

Setting aside where visitors come from, mobile visitors are substantially less likely to interact with a site’s homepage. Below we plot, for the same collection of sites as above, the fraction of visitors that have visited any landing page (e.g. the homepage, a section front) over a month.

homepage-all

What we see is dramatic: Desktop visitors are over 4x more likely to visit landing pages than those on phones.

Is that because mobile visitors come from social sources, and social visitors are less likely to visit landing pages—a fact that’s often cited when discussing the state of homepage traffic? Or is it not an issue of referrer at all—are mobile visitors intrinsically less likely to visit landing pages? To move toward an answer, we can control for referrer and ask the same question. Below, we plot the fraction of visitors who come to the site from Facebook and then and during the same month (but not necessarily on the same visit) visit a landing page.

homepage-facebook

Comparing this graph to the previous one, three things are clear:

  1. As discussed above, mobile visitors are significantly less likely to ever visit landing pages than desktop and tablet visitors.
  2. Similarly, visitors who come from Facebook are significantly less likely to ever visit landing pages than those who come from other sources. On average, only 6% of visitors who come from Facebook ever visit a landing page, compared to nearly 14% of overall visitors.
  3. These two phenomena are to some degree independent—desktop-based Facebook visitors are half as likely to visit landing pages as other desktop-based visitors, while mobile Facebook visitors are one-third as likely to visit homepages as other mobile visitors.

It’s also worth a quick note that, in all of these respects, tablet traffic is much closer to desktop traffic than it is to mobile traffic.

Overall, this seems to be cause for substantial concern to publishers—increases in social and mobile traffic are the two most significant traffic trends of the past few years, and both are strongly associated with drops in homepage traffic. Since, as we’ve seen before, homepage visitors are typically a site’s most loyal audience, potential drops in homepage visitors should be concerning. In the short term, it’s safe to assume that a successful mobile strategy will hinge upon a steady stream of social links—that visitors won’t return unless we reach out to them directly. In the longer term, there’s a lot of work for all of us in determining how best to build an audience in a post-desktop (and potentially post-homepage) world.

Revisiting Return Rates

July 14th, 2014 by Josh

Starting today, we’ve updated our definition of return rate in both our Weekly Perspectives and in the Chartbeat Publishing dashboard. Consequently, you’re likely to see a shift in the numbers in your dashboard — so we wanted to write a quick note explaining the change, why we made it, and what you can expect to see.

Defining return rate

Return rate, if you’re not familiar with it, is a metric designed to capture the quality of traffic that typically comes from a referrer. It measures the fraction of visitors coming from a given referrer who return to a site later — if 1,000 people come to a site from, say, Facebook, should we expect 10 of them to come back or 500? Depending on the answer, we might interpret and respond to a spike from Facebook quite differently. While the intuition behind return rate is straightforward, the actual formula used to calculate it is a bit more up for grabs. Up until now, we’ve calculated return rates using the following formula: CodeCogsEqn (3) That formula roughly captures a notion of “how likely is it, for a given visit from Facebook, that that visit will be ‘converted’ into a return?”   As we’ve talked through that definition over the past year, we’ve come to realize that it’s more natural to phrase returns in terms of people, not visits — to ask “how likely is it, for a given visitor from Facebook, that that person will be ‘converted’ into a return?” Hence, we’re now using the following calculation: CodeCogsEqn (4) So, rather than speaking in units of “visits,” this definition speaks in units of “visitors” — a seemingly small (but significant) change. In addition, we’re now only counting a return if it’s at least an hour after the initial entrance, which corrects for a pattern we sometimes see where visitors enter a site and then re-enter a few minutes later.    

What's changing?

It’s likely that the return rate numbers in your dashboard and Weekly Perspectives will drop under this new definition. To help you sort out whether your numbers are trending up or down, we’ve gone back and recalculated reports using the new methodology, going back to the beginning of June. We hope that the transition to the new definition is painless, but if you have any questions, feel free to comment or get in touch with me at josh@chartbeat.com

On Engagement & Viewability: Why Quality Content Makes Good Business Sense

June 19th, 2014 by Josh

On March 31, the Media Rating Council (MRC) announced it was lifting its advisory on viewable impressions for display advertising, bringing the industry one step closer to transacting on viewability for the first time. The point at which publishers are asked to deliver highly viewable campaigns is rapidly approaching. If you haven’t started to develop a strategy to maximize the viewability of your ads, I’d wager that in the next three months, you will.

There are many tactics that can be applied to improve your ads' view ability: ensuring fast ad loads; lazy-loading advertisements; and redesigning a website to feature always in-view units.

One issue has gotten surprisingly little discussion, though: Ads are much more viewable on pages that people actually want to read. Take a look at the following figure, which was computed across a sample of a billion ad impressions across the month of May 2014.

Screen Shot 2014-06-19 at 12.00.22 PM

We see there’s a strong relationship between what fraction of ads are seen and how long a person spends reading the page: as Engaged Time increases from 15 seconds to one minute, viewability goes up by over half, from 37% to 57%. Visitors who read for more than 75 seconds see more than 60% of advertisements.

This isn’t too surprising. Of course, people who read pages more deeply see more of the ads on the page, but it’s still worth taking note. We’ve argued for years that articles with higher average Engaged Time should be promoted because they represent the articles your audience is most interested in, and—in the days where viewability is more critical than ever—promoting your most deeply read articles makes good business sense, too.


Want more? Download the Chartbeat Quarterly.
download-now