As many major publishers and platforms have transitioned to using HTTPS, a great move for user privacy and security, a side effect has been a commensurate rise in dark social traffic — traffic that can’t be attributed to a particular referrer. Luckily, sites using HTTPS can still have their outbound traffic properly attributed if they chose to do so (e.g. by use of the meta referrer tag). We’ve chronicled major changes to dark social attribution here to ensure that publishers are up to date on the meaning of their traffic sources.One of the largest sources of dark social on the web has been the Yahoo homepage, which drives an enormous amount of traffic and moved to using HTTPS over the past year, causing their traffic to become dark for publisher sites that don’t default to HTTPS. For publishers who have partnerships with Yahoo, this has meant that directly attributing the volume of traffic they’re receiving has been difficult.On June 2, though, Yahoo pushed a change to add a meta referrer tag to their homepage and correctly attribute their traffic to sites using HTTP, and we’ve immediately seen dramatic results, as represented in the figure below. Since the change, we’ve seen a roughly 6x increase in attributable traffic coming from Yahoo, making it one of the most significant referrers on the web. On the day before the change, Yahoo was the 16th largest referrer across the Chartbeat network, and in the hours after the change Yahoo jumped to the sixth largest referrer (after Facebook, Google Search, Twitter, Google News, and Bing). Between this change and other updates by LinkedIn and Facebook, we’ve seen significant moves in the past 18 months by many of the world’s largest platforms to ensure that all traffic is correctly attributed. We’ll continue to work with publishers and platforms to track down sources of dark social, and we’ll keep you updated here as more publishers move into the light.Technical note: for those used to the hsrd.yahoo.com referrer, traffic sent via this change will carry the referrer yahoo.com, not hsrd.yahoo.com.
Beginning several days ago (the evening of Tuesday, 1/20, to be precise), you may have noticed a significant increase in the traffic on your site from LinkedIn: across our network, traffic from linkedin.com increased by over 3x. Below, we’ll detail why that change occurred, and what publishers should expect going forward.Over the past year, publishers have become increasingly interested in traffic from LinkedIn, as the LinkedIn team has been steadily working to improve their feed experience with the launch of their new mobile app and content platforms. Nevertheless, when looking at referrer traffic in analytics tools like Chartbeat, web traffic from linkedin.com has always seemed smaller than it should for such a large platform, especially given the volume of traffic we see from LinkedIn’s counterpart apps, which shows up under the referrer name lnkd.in.On January 20th, that changed when LinkedIn made a change to correctly attribute their traffic, some of which had previously been categorized as dark social. The impact of that change was immediate and significant. Let’s look at traffic coming from linkedin.com to sites across the Chartbeat network over the last six months, we see two trends: a steady increase over the year, followed by a huge increase at the end of January.
Zooming in on the right side of the graph, January, 2016, we can see the immediate change in traffic as the attribution change was pushed:If we compare numbers from just after the change to the same time during previous weeks, traffic from linkedin.com was up by over 3x.
Some sites saw more than 6x increases in their LinkedIn traffic.
While LinkedIn still isn’t a major traffic source for many types of sites, we expect that many business-, media-, and technology-focused sites will see LinkedIn as a top-10 referrer going forward.With Facebook’s change last year to help attribute all of their traffic, LinkedIn’s change here, and other work to come, we’re excited to see more traffic correctly attributed. We’ll continue to work with platforms in the coming months to bring their dark social traffic into the light.
In general, when we publish data, it’s always in anonymized, aggregated form — that sort of data lets us identify the cross-network trends relevant to all publishers. Now and then, though, we have occasion to look deeply at one publisher’s data, which lets us pull out facts that might be obscured when aggregating across tens of billions of pageviews. This is one such special occasion.Many of you have seen our list of the 20 most-read articles of 2015. Today, we wanted to give you an analysis of the data behind the success of the most read article on that list, The Atlantic’s March issue cover story, “What ISIS Really Wants” by contributing editor Graeme Wood. The piece is the most popular article in the history of The Atlantic and accrued more Total Engaged Time than any other in the 164 million pages published in 2015 across the Chartbeat network of 50,000 sites. At its peak, it was the second highest traffic article page in the Chartbeat network, and it has received sustained traffic for over 10 months.On its way to becoming the most-read story of the year, this article availed itself of almost every conceivable audience-building opportunity — beginning with a carefully-executed launch, and then social and publicity strategies to bring sustained attention to the piece. So the data around this article provides a fantastic lens into the mechanisms by which digital content can succeed.
On its way to becoming the most-read story of the year, this article availed itself of almost every conceivable audience-building opportunity.
Our list of top articles was calculated using Total Engaged Time — the sum across all pageviews of each visitor’s active Engaged Time — and the article was an outlier in terms of both components (pageviews and engaged time), so we’ll look into both traffic and engagement in turn.
From the perspective of traffic, this story was incredibly successful, receiving nearly 20 million pageviews between its publication in mid-February 2015 and the end of the year. As we see in the graph below, traffic occurred in three distinct phases: a huge initial volume of traffic upon release, a long tail showing steady traffic over a period of months, and a major second pickup coinciding with the Paris attacks. The patterns of traffic during each phase were quite different, so we’ll examine them one-by-one.
Initial spike (February 16-March)
The piece, along with the rest of the March issue, was posted to TheAtlantic.com the same week that the magazine appeared on newsstands. The page’s initial spike, as we see below, occurred simultaneously for a number of referrers, but the vast majority of traffic in the first four days was driven by Facebook. As we’ll see echoed in the November spike, Facebook has an unparalleled capability for generating massive spikes — spikes on other platforms are not nearly as large. Interestingly, as Facebook traffic died down on 2/20, Google traffic immediately picked up, perhaps in reaction to a number of media responses to the piece that were published on 2/20. This spike in Google traffic formed the basis of the next phase of traffic.
Interim period (April-October 2015)
Although, this window was quiet compared to the February and November spikes, the page garnered 2.4 million pageviews in this period by steadily accruing roughly 10,000 pageviews per day. The main traffic driver in this period was Google search, which drove nearly a million pageviews. Twitter traffic was all but non-existent, and Facebook drove a more modest amount of traffic.
While Google carried the day, many other referrers drove what, for any other piece, would’ve been very significant amounts of traffic — traffic from other sites is captured in the “external links” line above. In particular, citations in, and cross-promotion on, the Washington Post drove 97k pageviews, The Huffington Post drove 42k pageviews, and The Guardian and CNN drove 19k pageviews each. Interestingly, audiences coming from other publications were much more engaged with the story than the overall audience, spending an average of over 5 minutes of engaged time on the piece (compared to an overall average of 3 minutes).
Paris attacks (November 13-December 31 2015)
The largest spike in the article’s history occurred during the Paris attacks of November 2015.
Clearly the main traffic spike was from Facebook, but let’s zoom in to the beginning of the spike:
We see that Google search traffic peaked almost immediately after the attacks, nearly 12 hours before Facebook and Twitter traffic took off. This Google spike is a phenomenon we’ve been seeing for many breaking news events — while Facebook often delivers more traffic over the long term, Google is often the leading source of traffic in the first few minutes after an event. We also see an interesting phase change in Facebook’s promotion of the page at about noon UTC on 11/14.
Beyond just high traffic, the article’s atypically high engagement crowned it as the most-read article of the year. Most stories receive an average of under 45 seconds of engaged time per pageview; this one received over three minutes.Engaged Time was very strong across all top referrers, including over 3:30 minutes for visitors from Facebook, 2:20 for visitors from Google, and 4:15 for visitors coming from internal promotion. Similarly, engagement was strong on all devices, with an average engaged time of just under 3:00 for mobile and just over 3:40 for desktop.
Most stories receive an average of under 45 seconds of engaged time per pageview; this piece received over three minutes.
Let’s put engagement in the context of the article by looking at scroll depth instead of time. Because the article is so long and might take multiple visits to read, we’ll look at the maximum that each unique user attained, even if that took multiple pageviews. Note that the below figures measure scroll depth in pixels, so they understate scroll depth: 10,000 pixels, the first tick mark, is already over 2,000 words into the piece.
On desktop over 40% of readers scrolled the first 10,000 pixels, and 24% of readers made it to the end of the article.Turning back to repeat visitors to the article, we see two types of repeat visits: visitors who come to the page on subsequent visits and immediately abandon the page (possibly, for example, because they’ve already read it), and visitors who read deeply on subsequent visits. In the first figure below we see scroll on subsequent visits across all readers:
That is, overall visitors scroll less when they visit the article later. However, when we restrict to only visitors who actually engage with the page, we see a different story. In the below graph, we look specifically at desktop readers who engaged with the page (as judged by an engaged time of 15 seconds). We see that on subsequent visits these visitors scrolled significantly more deeply than on their first visit.
“What ISIS Really Wants” was literally a one-in-million success, and the vast majority of articles (including great ones), won’t garner anywhere near the level of engagement it received. Nonetheless, a look into its data shows a number of trends that are valuable for pieces of all sizes:
- Most critically, its place as the most-read article of the year clearly shows that there is a significant value associated with deeply researched, high-quality enterprise journalism. The other members of our top 20 list only serve to validate this statement.
- Facebook has no peers when it comes to driving massive traffic spikes; across two spikes Facebook drove over 6 million pageviews to the piece.
- Search engine placement has its value both for long-tail traffic and during breaking news events.
- Even when facing down a tough subject and over 10,000 words, readers are willing to actually read, on all devices and across multiple sittings.
Want to see what other numbers we’ve crunched in the past year? Check out Chartbeat’s look back at 2015 and media-minded resolutions for 2016: Past-Forward.
If you’re reading this post, you’ve likely already read our recent announcement about gaining accreditation from the Media Ratings Council for measuring a number of metrics, including viewability and active exposure time. You may have also read the Advertising Age feature on the Financial Times plan to trade ads upon those metrics — which gives a view into why we’re so excited about the possibility of people valuing ad inventory based on the amount of time audiences spend with it.We went through the process of accreditation to bring us one step closer to a world where ads can be valued by the duration for which they’re seen. But the thing is, trading on time only works if
- Both buyers and sellers agree that time is a valuable unit to trade on
- There’s a unified standard for how time should be measured
- Both parties have access to the data and infrastructure on which to trade.
We think the case for transacting on time is clear and compelling: time in front of eyeballs is closer to what advertisers are hoping to buy than impressions (viewable or not) are, and — as one example — Jason Kint made a compelling case of why time-based valuations benefits publishers as well. On points (2) and (3), though, we think there’s still a long way to go.On the measurement side, it’s critically important that — at least while there’s no IAB standard for time-based measurement — measurers be completely transparent about their exact methodologies so that buyers and sellers can understand exactly what the numbers they’re looking at mean. And, on the product side, even with an expanding set of products and a growing customer base, there’s simply more to be built than we ever will ourselves, and we think the industry will benefit from as many companies being involved as possible.To address both of those points, I’m excited to announce today that we’re publicly releasing our Description of Methodology.This is the main document on which our accreditation is based — it details the exact process of measurement that we use. Insofar as we have any “secret sauce” in our measurements, it’s in that document. Our goal in releasing it is twofold:
- We think we all will benefit from others’ careful analysis, critique, and improvement upon the techniques we’ve proposed. Our hope is that others will adopt and refine our measurement process, and that the ability of all parties to accurately measure user engagement will improve over time.
- The entire industry benefits from more people thinking carefully about their numbers, and we want it to be easier for other companies to gain accreditation. When we began our accreditation process, our largest hurdle was simply the fact that we didn’t have good examples of what successfully audited companies’ practices looked like. Our hope is that reading our document helps others down the line make it through in shorter order.
Having spent several years refining our process, there are a few hard-fought bits of knowledge that I wanted to highlight.
Measuring EngagementOur method of tracking engagement has been derived from a set of human-subject experiments, and comes down to a simple rule: at each second, our code makes a determination of whether or not the reader is actively engaged with the page and keeps a running counter of that engagement.In determining engagement, our code asks several questions:
- Is this browser tab the active, in-focus tab? If not, the user is certainly not actively engaged. If so, continue on to (2).
- Has the reader made any sort of console interaction (mousemove, key stroke, etc) in the last 5 seconds? If not, the user is not actively engaged. If so, consider them actively engaged and give one second of credit toward the visitor’s engaged time on the page.
- For each ad unit on the page: If conditions (1) and (2) have been met, is this ad unit viewable under the IAB viewability standard? If no, the ad has not been actively viewed this second. If so, give the ad one second of credit toward its active exposure time.
The five second rule is an approximation (it’s easy to construct cases where a person, say, stares at a screen for 1 minute without touching their computer), but we believe we’ve collected compelling evidence that it correctly measures the vast majority of time spent, and it provides a good measurement for guaranteed time spent — it’s difficult to construct any scenario in which a visitor is measured as active by our system but isn’t actually looking at the page. It also meets the IAB standard for being a clear, consistently applied, evidence-based standard (contrasted with, for instance, a different approach in which per-user engagement models were built). It’s also worth noting that others have independently arrived at similar methodologies for measuring active engagement.It’s important to note, though, we made a few mistake early on that others might want to avoid:
- Measuring engagement as a binary variable across a ping interval: For the most part, our code pings measurements back to our servers every 15 seconds (more on that later). An early attempt at measuring engaged time, recorded a visitor as either engaged for 15 seconds (if they’re interacted with the site at all since the last ping) or 0 seconds (if not). That gives undue credit if, for instance, a visitor engages just before a ping goes out.
- Tracking too many interactions (especially on mobile): When we initially began tracking mobile sites, we thought we’d listen to every possible user interaction event to ensure we didn’t miss engagement. We quickly had to change course, though, after hearing customer complaints that our event listeners were adversely affecting site interactions. There’s a balance between tracking every possible event and ensuring the performance of clients’ sites.
- Not correcting for prerender/prefetch: Before the MRC, we’d never seriously considered correcting for prerendering (browsers such as Chrome can render pages before you actually visit them to improve page load time). Having done it: in short, you need to do this if you want to count at all correctly.
- Not considering edge cases: Does your code correctly measure if a person has two monitors? If the height of the browser is less than the height of an ad? If the person is using IE 6? If a virtual page change occurs or an ad refreshes, are you attributing measurements to the right entity?
- When a visitor is active on a page, we ping data back every 15 seconds of wall clock time.
- If they go idle, by definition our numbers (except raw time on page) aren’t updating, so there’s no need to ping as frequently. We use an exponential backoff — pinging every 30 seconds, then 60 seconds, then two minutes, etc — and immediately bounce back to our 15 second timing if a visitor reengages with the page.
- When a qualifying event occurs, we ping that data immediately to ensure that we record all instances of these events. Currently qualifying events are data about a new ad impression being served coming in and the IAB viewability standard being met for an ad.
- We also hook into the onbeforeunload event, which is called when a visitor exits a page, and send a “hail Mary” ping as the person exits — because the page is unloading there’s no guarantee that the ping request will complete, but it’s a best effort attempt to get the visitor’s final state across in the case that they never visit another page on the site (in which case the method described in (4) isn’t able to help).