Archive for October, 2013

It’s easy to get excited about metrics, measuring content effectively, and the related tools and features that accompany these advancements in thinking and strategy. That said, it’s always nice to take a step back and think about how journalism in the context of analytics is evolving in a broader sense. Tow Fellow Caitlin Petre frames some interesting ideas about quantifying journalism in her blog post for the Tow Center blog, and we’re reposting it here for you all to enjoy. Throw your ideas in the Comments section – we’d love to hear what you’re thinking.

This post originally appeared on the Tow Center for Digital Journalism Blog on October 30th, 2013.

Journalists are seeing an explosion of quantitative data about how readers interact with what they write. To date, much of the conversation about metrics and news has focused on the dangers of using metrics to guide news judgment or, on the other hand, the risks of ignoring metrics completely. But crucial empirical questions about how metrics are produced and put to use remain largely unanswered. How do analytics firms measure complex qualities like engagement, make predictions about the future performance of content, and communicate with journalists about the value of metrics? And how do journalists at different types of news organizations use analytics in their day-to-day work? Are increasingly sophisticated measures of stories’ performance shaping journalists’ ideas about what is important, interesting, or newsworthy? Has the availability of such data changed the internal dynamics of news organizations?

These are some of the questions I aim to tackle in my Tow project, using qualitative methods, such as ethnographic observation and in-depth interviews, to better understand the development and use of metrics in an analytics firm and two news organizations. But before I can answer these questions, I have to ask a different one – one that is as dreaded in my field (sociology) as it is common: What is this a case of? Even though researchers have a tendency to become infatuated with the most minute details of our subjects, what we’re ultimately trying to do is identify and account for patterns in the social world. The classic “what is this a case of?” question prods us to zoom out, to put things in context, to consider the broader implications of whatever it is we’re studying.

So, what are metrics a case of? While there is a burgeoning movement to measure journalism – especially non-profit and investigative pieces – more qualitatively, most metrics categorize and count – page views, unique visitors, time on page, drop-off rate – with the aim of comparing things: pieces of content, news organizations, authors, readers. Metrics, and the big data trend of which they are a part, represent what philosopher Ian Hacking calls “an avalanche of numbers” made possible by astonishing advances in the ability of computers to collect, store, and process huge amounts of data. And metrics aren’t the only numbers in the avalanche – more and more journalists are now deriving stories from their analysis of quantitative datasets.

Sociologists Wendy Espeland and Mitchell Stevens have argued that quantification, like speech, is “a social action that…can have many purposes and meanings” that arise and shift through use. Scholars who study quantification have sought to uncover these purposes, meanings, and uses: here are two ideas from their research that can guide our thinking about the role of numbers in the production of contemporary news. The first concerns metrics, the second concerns the growing prevalence of data journalism.

  • Numbers can discipline, even in cases where they aren’t intended to. Michel Foucault famously argued that statistical measures have been used in attempts to control and “normalize” populations that were considered deviant. But sometimes numbers that were meant merely to measure inadvertently serve a disciplining function. U.S. News and World Report’s law school rankings were designed to provide prospective students better information about schools they were considering, but Wendy Espeland and Michael Sauder discovered that law school personnel internalized these measures, and began to change their admissions and financial aid policies to better conform to them. In other cases, however, the implementation of quantitative accountability measures faces considerable resistance and resentment, such as in Tim Hallett’s ethnographic study of the faculty at an urban elementary school. My preliminary research (as well as work by C.W. AndersonPablo Boczkowski, and others) suggests that both internalization and resistance are present in journalists’ response to metrics.
  • Numbers can establish trust where it is lacking. Historian Theodore Porter has argued that because of numbers’ longstanding association with rationality and objectivity, quantification can be a useful “strategy for overcoming…distrust,” especially in professional fields that are susceptible to outside criticism. At a time when public trust in journalism has dropped precipitously, then, we might expect the standards of journalistic evidence to become increasingly quantitative. As Tim Berners-Lee puts it in The Data Journalism Handbook, “it used to be that you would get stories by chatting to people in bars, and it still might be that you’ll do it that way sometimes. But now it’s also going to be about poring over data and equipping yourself with the tools to analyze it and picking out what’s interesting.”

Contemplating the increasingly important role of quantitative data in journalism also leads us to interesting questions about the idea of “objectivity,” and specifically about the relationship between scientific and journalistic definitions of this term: How do they overlap? Where do they conflict? If we are indeed seeing a quantitative turn in journalism, will it push these two conceptions of objectivity to be reconciled? These are questions I’ll address in future posts.

Caitlin Petre is a Tow Fellow working on a project on Metrics:Production and Consumption for the Tow Center for Digital Journalism.  The Metrics:Production and Consumption project is made possible by generous funding from both The Tow Foundation and the John S. and James L. Knight Foundation.  To learn more about the Tow Center Fellowship Program, please contact the Tow Center’s Research Director Taylor

Chartbeat Rising

A few weeks ago we launched the new Chartbeat Labs page and with it introduced our latest hack project that’s totally free and open to everyone on the internet, whether you’re a client or not: Chartbeat Rising. Like all of our Labs projects, Rising was conceived in one of our Hack Weeks, where our team gets a week between product development cycles to learn new dev skills and build something awesome with our data.

This one particular Hack Week (way back when), Isaac, one of our data dudes, decided to dive into aggregating topics sites using Chartbeat were writing about. But not just to see what was popular, to see what’s actually keeping people’s attention. Turns out, there’s a huge difference between what people click on and what they spend time reading.

So that data + design + front end genius + TLC from Chartteam all-stars Danny and Meagan gave us the Rising prototype you see today.

Let me give you a quick tour:

1. Bubble Topics

Those sites that allow us to aggregate and anonymize their data are all pulled into a big data pile (that’s the technical term for it), and are sorted by category – news, entertainment, or technology. Then the topics within those categories are ranked by the most popular – those with the most amount of concurrents, or most Engaged Time – those with the highest amount of time people spend actually reading articles about this topic. Click on the toggle on the top right and you’ll usually see a pretty big difference.

2. Wiggly Movements

My favorite part, so I have to cover this, are the bubble movements. As Rising is all about the topics that are rising to the top, the movement shows how the bubbles are interacting with each other – the biggest bubbles with the highest ranks are wiggling their way around the other bubbles to rise to the top.

3. Topic Context

Why the hell is “telekinesis” a top tech story? Fantastic question. Click the bubble and find out. Since all our data is anonymous, these headlines pulled into the call out box aren’t from Chartbeat sites, necessarily, but pulled from a search API to give context to what Rising is presenting as relevant. That may change in the future as we iterate.

Like all our Labs projects, Chartbeat Rising is hacky and will probably break a bit every once in awhile, but it also means there’s so much more we could build onto this guy. In only the first couple of weeks it’s been live, we’ve already had requests to see topics sorted by a single URL or by Geo (Hi, Anjanette!). So I hope you share it around and give us loads of feedback on what you’d want to see it do.


This post is part three in our ongoing series on traffic sources. In part one, I talked about how we classify traffic and introduced some basic metrics for understanding the quality of traffic; in part two, we dove into some details on direct traffic. Today, I’ll talk about traffic from social sharing.

Overall, about 26% of traffic we measure comes from social sources — Facebook, Twitter, and email, for example — making social the second most significant source of traffic, next to direct. In some sense, social traffic and direct traffic represent polar opposites: Visitors who arrive via your homepage are, critically, people who intended to visit your site specifically rather than a particular piece of content. Those who come from social sources may or may not know what site they’re landing on, they’re coming because of an article that’s been recommended to them. That’s a double edged sword. On the one hand, social visitors are more likely than other visitors to actually read the pages they land on; on the other, they’re also amongst the least likely to return to your site, and when they do they’re very likely to only come via the same social channel.

Social is also categorically different than other sources of traffic because it’s the only channel that’s easily influenced — while converting visitors to come directly to your homepage is an art and affecting search engine placement leaves much to chance, we can actively choose which articles we put on social media and when to provide those links.


Before we talk about evaluating social traffic, it’s worth discussing what sort of visitors come from social sites and how they read. First off, social sources are a better than average source of new visitors: while an average of 31% of a site’s traffic comes from new visitors (those who haven’t visited in the past 30 days), an average of 41% of social visitors are new.



Social traffic is also dramatically more mobile-based than all other traffic — an average of 25% of traffic is on mobile, but on many sites over 40% of social traffic is mobile. That should affect what stories you push to social media, and when you push them. We’ll cover both of those topics below.


quote-3 (1)

Social engagement versus on-site engagement

People frequently take social media interactions as the de facto standard for “engagement” with a piece. The idea is that people who share a piece are likely to have enjoyed it. While there’s some kernel of truth here, our data suggests that there’s more to the engagement story than raw counts of tweets and likes.

Take a look at the graph below, which was first presented in Slate:

This graph shows how fully people read an article (as measured by how far down the page they scrolled; all articles shown here were over 3000 pixels high), compared to how frequently they tweet about it. If the most engaging stories to read were the stories that were most likely to be shared, we’d expect this graph to look like a line. Instead, we see that there’s essentially no correlation between the two numbers. That doesn’t mean that social interactions are a bad way to measure engagement, but it does show that social engagement and on-site engagement are often different phenomena.

Timing of social posts

So, what makes for successful social content? There’s been much written about how to write successful social posts — most recently, I read a great study by Knight fellow Sonya Song and its more concise writeup on Nieman Lab. It’s beyond the scope of this post to tackle what content to put in your social posts, but one question we’re frequently asked is what time of day is best for social sharing. Below is a chart showing how social traffic compares to overall traffic across for a set of sites (all of which are based in EST) across the past week.


Unsurprisingly, the shape of social traffic closely follows that of overall traffic, but it’s notable that social traffic substantially underperforms overall traffic from about 5am to noon, and social substantially overperforms overall traffic from about 3pm until 1am. From the perspective of driving traffic to your site, it appears that late afternoon through night is the best time to reach your readers on social media and get them to click through to your site.


quote-2 (1)

Interestingly, this trend appears to be true despite people’s best efforts to the contrary. Below, we see a graph of how frequently these sites posted to Twitter, compared to their social traffic.



Posting to twitter is strong all morning and reaches its peak just before noon, even though traffic from social is actually its strongest later in the day.

Return frequency

While we’re discussing timing, it’s worth noting that visitors who come to a site from social sources do so an average of 1.5 times per week. Below we see the distribution of how many times a visitor comes from social sources across a week.


About 82% of visitors who come from social only come once, but there’s a long tail of people who come two or more times.

As mentioned above, almost 80% of visitors who come to your site from a social source will only come to your site via that source. That figure is particularly bad for visitors from Twitter, of which only about 16% will return to your site directly. These are fairly significant numbers to consider as you decide where to invest time and resources into developing your audience.



This post barely scratched the surface of what can be said about social media — entire companies exist to help optimize social strategy — but I hope it started you thinking about how social sharing relates to your site’s overall traffic. We’ll save further discussion of social traffic for a future post; in the meantime, stay tuned for the next post in our traffic sources series, where we’ll cover external and search traffic.

Questions? Throw them in the Comments section and I’ll respond.


salon logo

“It’s been the feeling that following metrics too closely is corruptive to good quality journalism,” Haile said. “I think if you’re following the wrong metrics that’s true.”

Salon has a big story today about Chartbeat and our CEO Tony Haile. Writer Alex Halperin discusses online journalism in the era of listicles, click-based metrics, and frequent debate regarding what topics merit coverage, and how Chartbeat may continue to shake things up in the industry – for the better, we think(!). The article features real talk with Tony about measuring content quality and value through audience engagement, click-bait journalism, and where he thinks online publishing is heading.

Here’s the whole article and enjoy the excerpt below. If you have questions or comments, tweet at Tony – he’d love to hear from you.

But as Haile presents it, Chartbeat wants to change the data editors and, more importantly, advertisers care about. He thinks this could improve journalism’s quality by reducing the incentive to write click-bait headlines, produce unnecessary slideshows, pointlessly paginate articles and indulge in other chicanery to inflate page views.

Raising page views for its own sake, “Doesn’t help the audience,” Haile said. “The advertiser doesn’t get anything more from it. It’s just a way of gaming the numbers.”

“If [a headline reads] ‘Prince William caught in love triangle,’ it doesn’t matter what the story says,” Haile said. “I’ve got that click, I’ve got that page view. So it lends itself to lower quality.” But in a media climate where every post is judged on its own terms — whether it’s a war zone dispatch or a curated list of tweets about “Mad Men” – how can quality be measured?

Haile thinks the crucial metric should be time, how long a page captures readers’ attention. He believes that articles that engage readers, and are therefore more likely to create a loyal audience, should be worth more to advertisers. That might sound simple, but almost two decades into the era of online media, the industry hasn’t been able to make that happen.

– From “This man decides what you read”, Salon

On Wednesday, we hosted a webinar, “Preparing for the Data-Driven Future of Publishing,” with Tony, our CEO, and Joe, our product owner. They talked quite a bit about the next evolution of Chartbeat Publishing, and how publishers can start building loyal and returning audiences. I wanted to share a few slides with y’all, in case you missed the event or just wanted the highlights. These suckers are jam-packed with data and insights. Flip through, share with your team, and of course, don’t be shy about reaching out with questions. Enjoy!