Author Archive


2015 was a big year for top-quality journalism. Just looking at the 20 most read stories across the Chartbeat network, it’s clear that a heartening mix of longform reports and critical resources for breaking news captured and held the world’s attention this year. Quality content shone, even as the relationship between media and technology continued to shift – especially in the realms of mobile traffic, distribution platforms, and ad blocking.

In 2015, more than 70% of sites we measured saw traffic from mobile devices increase, and Facebook, as in prior years, generated the largest share of mobile traffic. In contrast to prior years, though, Facebook’s share of traffic itself was constant for most sites. That said, there’s no denying that the new channels for content distribution, like Instant Articles, Snapchat Discover, and Google AMP, will only grow in importance over 2016, presenting an opportunity for publishers to build their audiences. And this is the key. Even as some publishers, especially in Germany, are reporting high rates of ad blocking, by prioritizing audience, embracing new channels, and doubling down on speedy browsing we can build an even brighter media landscape for years to come.

So take some time to read Past/Forward. In it, we’ve proposed eight New Year’s resolutions for digital publishers seeking an outstanding 2016. We walk you through cutting down page load times, growing your loyal audience, writing winning headlines — pretty much everything future-focused publishers should strive for.

You can find Tony Haile’s forecast for 2016 and our eight digital media resolutions in Past/Forward.

If you’re reading this post, you’ve likely already read our recent announcement about gaining accreditation from the Media Ratings Council for measuring a number of metrics, including viewability and active exposure time. You may have also read the Advertising Age feature on the Financial Times plan to trade ads upon those metrics — which gives a view into why we’re so excited about the possibility of people valuing ad inventory based on the amount of time audiences spend with it.

We went through the process of accreditation to bring us one step closer to a world where ads can be valued by the duration for which they’re seen. But the thing is, trading on time only works if

  1. Both buyers and sellers agree that time is a valuable unit to trade on
  2. There’s a unified standard for how time should be measured
  3. Both parties have access to the data and infrastructure on which to trade.

We think the case for transacting on time is clear and compelling: time in front of eyeballs is closer to what advertisers are hoping to buy than impressions (viewable or not) are, and — as one example — Jason Kint made a compelling case of why time-based valuations benefits publishers as well. On points (2) and (3), though, we think there’s still a long way to go.

On the measurement side, it’s critically important that — at least while there’s no IAB standard for time-based measurement — measurers be completely transparent about their exact methodologies so that buyers and sellers can understand exactly what the numbers they’re looking at mean. And, on the product side, even with an expanding set of products and a growing customer base, there’s simply more to be built than we ever will ourselves, and we think the industry will benefit from as many companies being involved as possible.

To address both of those points, I’m excited to announce today that we’re publicly releasing our Description of Methodology.

This is the main document on which our accreditation is based — it details the exact process of measurement that we use. Insofar as we have any “secret sauce” in our measurements, it’s in that document. Our goal in releasing it is twofold:

  • We think we all will benefit from others’ careful analysis, critique, and improvement upon the techniques we’ve proposed. Our hope is that others will adopt and refine our measurement process, and that the ability of all parties to accurately measure user engagement will improve over time.
  • The entire industry benefits from more people thinking carefully about their numbers, and we want it to be easier for other companies to gain accreditation. When we began our accreditation process, our largest hurdle was simply the fact that we didn’t have good examples of what successfully audited companies’ practices looked like. Our hope is that reading our document helps others down the line make it through in shorter order.

Having spent several years refining our process, there are a few hard-fought bits of knowledge that I wanted to highlight.

Measuring Engagement

Our method of tracking engagement has been derived from a set of human-subject experiments, and comes down to a simple rule: at each second, our code makes a determination of whether or not the reader is actively engaged with the page and keeps a running counter of that engagement.

In determining engagement, our code asks several questions:

  1. Is this browser tab the active, in-focus tab? If not, the user is certainly not actively engaged. If so, continue on to (2).
  2. Has the reader made any sort of console interaction (mousemove, key stroke, etc) in the last 5 seconds? If not, the user is not actively engaged. If so, consider them actively engaged and give one second of credit toward the visitor’s engaged time on the page.
  3. For each ad unit on the page: If conditions (1) and (2) have been met, is this ad unit viewable under the IAB viewability standard? If no, the ad has not been actively viewed this second. If so, give the ad one second of credit toward its active exposure time.

The five second rule is an approximation (it’s easy to construct cases where a person, say, stares at a screen for 1 minute without touching their computer), but we believe we’ve collected compelling evidence that it correctly measures the vast majority of time spent, and it provides a good measurement for guaranteed time spent — it’s difficult to construct any scenario in which a visitor is measured as active by our system but isn’t actually looking at the page. It also meets the IAB standard for being a clear, consistently applied, evidence-based standard (contrasted with, for instance, a different approach in which per-user engagement models were built). It’s also worth noting that others have independently arrived at similar methodologies for measuring active engagement.

It’s important to note, though, we made a few mistake early on that others might want to avoid:

  • Measuring engagement as a binary variable across a ping interval: For the most part, our code pings measurements back to our servers every 15 seconds (more on that later). An early attempt at measuring engaged time, recorded a visitor as either engaged for 15 seconds (if they’re interacted with the site at all since the last ping) or 0 seconds (if not). That gives undue credit if, for instance, a visitor engages just before a ping goes out.
  • Tracking too many interactions (especially on mobile): When we initially began tracking mobile sites, we thought we’d listen to every possible user interaction event to ensure we didn’t miss engagement. We quickly had to change course, though, after hearing customer complaints that our event listeners were adversely affecting site interactions. There’s a balance between tracking every possible event and ensuring the performance of clients’ sites.
  • Not correcting for prerender/prefetch: Before the MRC, we’d never seriously considered correcting for prerendering (browsers such as Chrome can render pages before you actually visit them to improve page load time). Having done it: in short, you need to do this if you want to count at all correctly.
  • Not considering edge cases: Does your code correctly measure if a person has two monitors? If the height of the browser is less than the height of an ad? If the person is using IE 6? If a virtual page change occurs or an ad refreshes, are you attributing measurements to the right entity?

Because devices, events available to our JavaScript, and the patterns of consumption on the internet change with time, we revisit this measurement methodology annually. If you’re interested in contributing, get in touch:

Ping Timing

Measurements are taken by our JavaScript every second, but it would cause undue load on users’ browsers if we were to send this measurement data back every second. In that sense, there’s a balance we need to strike between accurate measurement and being good citizens of the internet. Here’s the balance we struck:

    1. When a visitor is active on a page, we ping data back every 15 seconds of wall clock time.
    2. If they go idle, by definition our numbers (except raw time on page) aren’t updating, so there’s no need to ping as frequently. We use an exponential backoff — pinging every 30 seconds, then 60 seconds, then two minutes, etc — and immediately bounce back to our 15 second timing if a visitor reengages with the page.
    3. When a qualifying event occurs, we ping that data immediately to ensure that we record all instances of these events. Currently qualifying events are data about a new ad impression being served coming in and the IAB viewability standard being met for an ad.
    4. When a user leaves the page, we take a set of special steps to attempt to record the final bit of engagement between their last ping and the current time (a gap of up to 14 seconds). As the visitor leaves, we write the final state of the browsing session to localstorage. If the visitor goes on to visit another page on the same site and our JavaScript finds data about a previous session in localstorage, it pings that final data along.
    5. We also hook into the onbeforeunload event, which is called when a visitor exits a page, and send a “hail Mary” ping as the person exits — because the page is unloading there’s no guarantee that the ping request will complete, but it’s a best effort attempt to get the visitor’s final state across in the case that they never visit another page on the site (in which case the method described in (4) isn’t able to help).

If you found any of this useful, insightful, or flat-out incorrect, we’d love to be in touch. Feel free to reach out to me directly: Twitter @joshuadchwartz, email

As you’ve all but certainly heard, Facebook had a major outage midday on Friday. Overall traffic on news sites dropped by 3%, thousands took to Twitter to voice their frustration, and, apparently, a select few called the LA Sheriff’s Department. Most interestingly for us, the Facebook outage provided a natural experiment to look at what the world of web traffic looks like without Facebook. Here, I’ll delve into two issues that are particularly interesting to look at through the lens of the outage.

Facebook and dark social

So-called “dark social” traffic — traffic to articles that lacks a referrer because it comes via HTTPS or apps — is subject to endless speculation. What portion of it comes from emailed links? From links sent via instant messaging? From standard social sources like Facebook and Twitter but with the referrer obscured? From search sites that use HTTPS? By virtue of the fact that no explicit referrer is sent, it’s impossible to tell for sure. Since Facebook makes up a huge portion of non-dark traffic, one might guess that a big chunk of dark traffic is actually Facebook traffic in disguise.

Of course, during the outage virtually all Facebook traffic was stopped, so we can use that data to ask how much dark traffic was definitely not coming from Facebook. The answer? Very little of it was coming from Facebook directly. Take a look at the graph below.


Facebook referrals dropped by almost 70% during the outage (note that traffic didn’t drop to 0, presumably because some number of people had Facebook pages open before the outage). There’s certainly a drop in dark social, but it’s not nearly as stark, and dark social traffic just before the outage was only 11% higher than at its low point during the outage. Since 70% of Facebook traffic dropped off, that would imply that at most 16% (11% / 70%) of traffic could’ve been directly attributable to Facebook.

Now, of course, we’d expect some other social sharing might be negatively impacted — if people aren’t discovering articles on Facebook, they might not be sharing them in other ways. So, that doesn’t mean that 16% of dark social traffic is from Facebook, but it does provide strong evidence that 84% of dark social traffic is something other than Facebook traffic in disguise.

Where people go in an outage

As I discussed in my last post, a huge percentage of mobile traffic comes from Facebook. Given that, we’d probably expect mobile traffic to be hardest hit during the outage. And, indeed, entrances to sites on mobile devices were down 8.5%, when comparing the minute before the outage to the lowest point while Facebook was down.

Interestingly, though, we see the opposite effect on desktops: a 3.5% overall increase in desktop traffic after the beginning of the outage. That increase was largely fueled by a 9% increase in homepage direct traffic on sites with loyal homepage followings. We saw no increases in traffic via other referrers, including Twitter and Google News, during the outage. While we certainly can’t claim that the outage was the cause of that uptick in desktop traffic, the timing is certainly notable.


In short, then: our brief world without Facebook looked a bit different, albeit in predictable ways. Significantly less news was consumed on phones, slightly more homepages were visited on desktops, and 30 minutes later, when Facebook came back online, traffic returned to normal.

In the much-circulated New York Times Innovation Report, perhaps the most discussed graph was this one, showing a roughly 40% decline in homepage audience over the past three years.


With that graph, innumerable articles announcing the “death of the homepage” were written, in The Atlantic, Poynter, and on numerous blogs. Most hinged on the relationship between the rise of social traffic and the decrease in homepage traffic. One thing that isn’t mentioned in most of these articles, though, is that the rise in social traffic was contemporaneous with a rise in mobile traffic, and that mobile is as much a principal part of the story as social is. Here, I’d like to explore the three-way interaction between mobile traffic, social, and homepage visitation.

Social traffic and mobile devices

The importance of social sharing on mobile devices is much discussed. (Take for example, the recent ShareThis report, which reported that 63% of Twitter activity and 44% of Facebook activity happens on mobile.) People aren’t just using social media on mobile to share articles, of course, they’re also clicking to those articles. Below, we break down the share of traffic coming from Facebook and Twitter by device across a random sample of our sites. (Note: We specifically chose sites without separate mobile sites and without mobile apps, to ensure that we’re making fair comparisons across devices.)


Facebook’s share of overall mobile referrals is nearly 2.7x larger than its share on desktop. Twitter’s share is 2.5x larger on mobile than on desktop. And, if anything, those numbers likely undercount the significance of social referrals, since many apps don’t forward referrer information and get thrown into the bucket of “dark social.” In some sense, then, it’s fair to say that—for most sites—mobile traffic more-or-less is social traffic.

Mobile and homepage traffic

Setting aside where visitors come from, mobile visitors are substantially less likely to interact with a site’s homepage. Below we plot, for the same collection of sites as above, the fraction of visitors that have visited any landing page (e.g. the homepage, a section front) over a month.


What we see is dramatic: Desktop visitors are over 4x more likely to visit landing pages than those on phones.

Is that because mobile visitors come from social sources, and social visitors are less likely to visit landing pages—a fact that’s often cited when discussing the state of homepage traffic? Or is it not an issue of referrer at all—are mobile visitors intrinsically less likely to visit landing pages? To move toward an answer, we can control for referrer and ask the same question. Below, we plot the fraction of visitors who come to the site from Facebook and then and during the same month (but not necessarily on the same visit) visit a landing page.


Comparing this graph to the previous one, three things are clear:

  1. As discussed above, mobile visitors are significantly less likely to ever visit landing pages than desktop and tablet visitors.
  2. Similarly, visitors who come from Facebook are significantly less likely to ever visit landing pages than those who come from other sources. On average, only 6% of visitors who come from Facebook ever visit a landing page, compared to nearly 14% of overall visitors.
  3. These two phenomena are to some degree independent—desktop-based Facebook visitors are half as likely to visit landing pages as other desktop-based visitors, while mobile Facebook visitors are one-third as likely to visit homepages as other mobile visitors.

It’s also worth a quick note that, in all of these respects, tablet traffic is much closer to desktop traffic than it is to mobile traffic.

Overall, this seems to be cause for substantial concern to publishers—increases in social and mobile traffic are the two most significant traffic trends of the past few years, and both are strongly associated with drops in homepage traffic. Since, as we’ve seen before, homepage visitors are typically a site’s most loyal audience, potential drops in homepage visitors should be concerning. In the short term, it’s safe to assume that a successful mobile strategy will hinge upon a steady stream of social links—that visitors won’t return unless we reach out to them directly. In the longer term, there’s a lot of work for all of us in determining how best to build an audience in a post-desktop (and potentially post-homepage) world.

Revisiting Return Rates

July 14th, 2014 by Josh

Starting today, we’ve updated our definition of return rate in both our Weekly Perspectives and in the Chartbeat Publishing dashboard. Consequently, you’re likely to see a shift in the numbers in your dashboard — so we wanted to write a quick note explaining the change, why we made it, and what you can expect to see.

Defining return rate

Return rate, if you’re not familiar with it, is a metric designed to capture the quality of traffic that typically comes from a referrer. It measures the fraction of visitors coming from a given referrer who return to a site later — if 1,000 people come to a site from, say, Facebook, should we expect 10 of them to come back or 500? Depending on the answer, we might interpret and respond to a spike from Facebook quite differently.

While the intuition behind return rate is straightforward, the actual formula used to calculate it is a bit more up for grabs. Up until now, we’ve calculated return rates using the following formula:

CodeCogsEqn (3)

That formula roughly captures a notion of “how likely is it, for a given visit from Facebook, that that visit will be ‘converted’ into a return?”


As we’ve talked through that definition over the past year, we’ve come to realize that it’s more natural to phrase returns in terms of people, not visits — to ask “how likely is it, for a given visitor from Facebook, that that person will be ‘converted’ into a return?” Hence, we’re now using the following calculation:

CodeCogsEqn (4)

So, rather than speaking in units of “visits,” this definition speaks in units of “visitors” — a seemingly small (but significant) change.

In addition, we’re now only counting a return if it’s at least an hour after the initial entrance, which corrects for a pattern we sometimes see where visitors enter a site and then re-enter a few minutes later.



What’s changing?

It’s likely that the return rate numbers in your dashboard and Weekly Perspectives will drop under this new definition. To help you sort out whether your numbers are trending up or down, we’ve gone back and recalculated reports using the new methodology, going back to the beginning of June.

We hope that the transition to the new definition is painless, but if you have any questions, feel free to comment or get in touch with me at