Author Archive

We are all barraged by numbers every day. It is my job, as a data scientist, to ensure that the statistics I report are accurate. The quality and accuracy of Chartbeat data are of utmost importance to me, and these are things I take very seriously. I am proud of the hard work that our team does to ensure that the information we provide to you is as accurate as humanly possible.

Here’s the thing: I made a mistake.  Last week, I presented findings at a GEN Summit Masterclass and after further analysis, I have discovered that several of the numbers that I presented, and which I subsequently blogged about, warrant corrections. I calculated these numbers via a custom data pull and analysis, and there was human error involved.  I have since rerun the analysis, corrected the faulty numbers, and have reposted my Masterclass slides to Slideshare with an intro calling out the changes.

Below are the key corrections:

  • Socially loyal visits are not greater than directly loyal visits as initially presented.
    • For both the EU and the US, the number of socially loyal visits is only a third of directly loyal visits. If you hold out Dark Social from this number and consider only social traffic with a specified referrer (which you may or may not want to do) the numbers drop to less than 5% for the EU and 10% for the US. That is, socially loyal visits are much, much less common than directly loyal visits.
  • Mobile visits to an article are still extremely high, but slightly lower than initially presented.
    • Over 70% of mobile visits in the US and over 60% of mobile visits in the EU were to an article. (The previous finding was that 80% of mobile visits in both regions were to articles.)

I apologize but I’m glad to have the chance to make these corrections.  If you have any questions, feel free to email me at or tweet me @dpvalente.

Guten Morgen!  I’m sitting here in the Vienna airport on my way back from this year’s GEN Summit, so I thought I’d use this opportunity to send a few interesting data bits your way.

The theme of this year’s conference was the rise of platform-driven news, and the role that the large tech companies are now playing in the media industry. Because of this, we thought it provided a great opportunity to talk about what we’re seeing across the Chartbeat network regarding the data behind distributed content, and how these platforms (i.e., Facebook & Google) are driving traffic, engagement, and user loyalty.  For those of you who couldn’t attend the Chartbeat Masterclass at the conference, head over to Slideshare to view the slides from my presentation.   

Here are a few takeaways you’ll find in the slides:

  • More people are now visiting sites on mobile devices than on desktop devices.  Sure, you probably already knew this.  But did you know that in the EU, 48% of visits in May were from mobile devices, compared to 45% on desktops?  In the US, 51% of visits were on mobile devices, with 43% on desktops.
  • The primary mode of behavior for a mobile audience is to land on an article page (and visit only a single article page).  Over 80% of mobile visits in both the EU and the US were to an article, and these visits typically come from Facebook (although traffic from Google is close behind).
  • US audiences spent 30% more time engaged with articles on mobile devices than on desktops.  In the EU, however, the reverse is true.  For the EU, desktop-direct-to-landing page traffic is still strong with about 25% higher engaged times.

And my favorite:

  • In May 2016, 50% more US users and 20% more EU users were socially loyal than directly loyal.  By socially loyal, I mean users who visit approximately every other day and come from a social referrer. This challenges our long-held notion that the “true” loyal audience is the audience that comes directly to our homepage. To me, this provides hope that in this mobile, article-first, driven-by-platforms world, quality content still prevails and will build a loyal audience.

If you have any questions, feel free to drop me a note by email ( or tweet me @dpvalente


Two weeks ago, Facebook opened up its Instant Articles platform to all publishers. And last week, Facebook announced that they will be updating their News Feed algorithm once again. The most recent change to their algorithm will look at predictions of whether a user in the Facebook mobile browser or on an Instant Article page will click into an article and actually read that article. Time spent viewing the article will continue to be a large factor in News Feed rankings.

When Facebook makes changes, the publishing industry reacts with questions and concerns (see, for example, here, here, or here). That said, each time we here at Chartbeat have looked at Facebook referrer traffic in response to one of these changes, we haven’t seen any major effect across our network.  Here’s what the median percentage of traffic from Facebook looks like across our network so far this year:


Besides the typical weekday / weekend variations, traffic from Facebook is remarkably stable. We see Facebook driving between 40-50% of traffic on mobile devices during peak weekday traffic and about 12-15% of traffic on desktop devices during peak weekday traffic (note that these numbers exclude cases in which we have no data for referrer, as is the case for dark social). Even given the two big changes that happened this month, we are actually seeing a slightly higher-than-normal amount of referral traffic! This increased traffic is on the order of 3-4% for the median (smaller if you look at the average; 1%).

It is important to note that these curves show the median proportion across our network. Individual sites respond in different ways, so you may have seen your traffic rise or fall in response to one of these changes. Sitting in the newsroom, it is hard to see the forest for the trees, so to speak — we have the power of statistics on our side. But from what we continue to see, the majority of publishers are incredibly adept at responding to Facebook’s changes and are keeping referral numbers high.

My headphones are in, and I’m listening to Jóhann Jóhannsson’s The Miners’ Hymns — one of my favorite albums for coding. I’m finishing up an API for our new Heads Up Display (HUD), for which I’d worked out the math a few days earlier. I had spent the previous day figuring out how to implement the math and testing out edge cases with synthetic data, interspersed between product planning meetings and debugging a performance issue with a new component of the HUD backend. I’m about to put out a Pull Request, when I take a look at Nagios and notice that one of the systems that powers the current HUD has just gone critical. I start to debug, and a second later I get a Slack message from someone on Chartcorps saying that customers are starting to notice that the HUD is down. I see that it is a simple fix this time; I just have to restart one of services that powers the HUD.

Just in time, too, because I have to head uptown with members of our sales team to talk to one of our strategic clients about our new headline testing product. On my way out the door, one of our designers pulls me aside to look at the current designs for displaying the results of headline tests in the new HUD: “Does this viz accurately represent the data?” We talk for ten minutes, weighing pros and cons and looking at design alternatives. We talk about color schemes. We talk a bit about user interaction.

The meeting uptown goes wonderfully; I give a high-level overview of multi-arm bandit headline testing, answer some technical questions about the product, and get great feedback about the product to take back to the team. When I get back in the office, I see a message from Lauryn Bennett, our Head of Brand, asking if any of us on the Data Science team have time to answer a request from a journalist about a news event that just happened. This particular request doesn’t require an in-depth statistical analysis, so I write a quick script to pull the numbers. I spend a bit of time looking at the results and then write up a few paragraphs describing what I’ve found. I then head into a meeting with fellow engineers, designers, and product owners to plan our next Sprint.

This is my typical day.

Download now: Chartbeat Insider Guide: How to use Headline Testing to Hook and Hold Readers


According to the Harvard Business Review, data science is the sexiest job of the 21st century. If you have data, you need a data scientist to get value from it; data scientists are the only ones who can wrangle #BigData into submission. Apparently, data science will save us all.

I’ve read many pieces over the past year trying to describe what data science actually is. There’s usually some talk about math and programming, machine learning, and A/B testing. Essentially these pieces boil down to one observation: data scientists do something with data. #DeepLearning anyone? I’ve followed arguments on Twitter and blogs about who should and should not be considered a data scientist. Is Data Science even a new discipline? How does it differ from Statistics? Programming? Or is it this…


Ok, then, what the hell does a data scientist actually do?

Now this is a question I can answer. And since I haven’t read many concise descriptions of what data scientists do day-to-day, I figured that I’d throw my hat into the ring and talk about the kind of data science we do here at Chartbeat.

“WARNING: it may be a bit different than what you might have heard that data scientists typically do for a living.”

OK, so, what exactly do you do?

Our team here at Chartbeat are what I like to call Product-Centered Data Scientists — meaning the majority of things we do on a daily basis are in direct support of our products. Because we are a data company, our role is pretty central to the organization. Of course, we do math. We build data pipelines and write production code. We do all kinds of analyses. But we also work regularly with sales and marketing. We go on customer visits and help out with sales calls. We even participate in user research with our designers, UX, and product owners.


As a tech company, we build software products. Plain and simple. As a data company, every one of those products has a data science need. Because of this, our team is embedded within the engineering team, and most of us take on heavy backend or front-end roles in putting code into production. We don’t just hand prototypes over to engineering for them to implement. We do the implementation. We tune our Redshift clusters, find API performance bottlenecks, choose the proper data structures. We are also part of the backend on-call rotation. If Chartbeat were to break at 2AM, we’d help fix it.

For example, just consider our Engaged Headline Testing tool. Andy Chen and Chris Breaux have been instrumental in designing, building, and maintaining the systems that power headline testing. Andy worked out the initial math for adding Engaged Time into the multi-arm bandit framework and was one of two people who built the initial backend. Chris Breaux has since taken over the Data Science role on the team and continues to push the math, and the product, to new places. The new features that will be released soon in that product are — in no uncertain terms — data science features.

In fact, all of us play central roles to each of the products with which we are associated. Josh Schwartz and Justin Mazur have built an enormous portion of our Ads Suite, Kris Harbold and Josh have built all of Report Builder, and Kris holds the distinction of being our only team member to have both front-end and backend code in production. Justin and I have worked on our Video Dashboard, and I’ve built a lot of the HUD. Each of us has contributed countless lines of code to all sorts of systems across Chartbeat.

“I don’t think it is an exaggeration for me to say that there is not a part of Chartbeat code that a data scientist has not touched.”


Okay, so we do math. This just comes with the territory. Sometimes sophisticated math, sometimes rote math. This math is either in direct support of a product or is part of an analysis we’re working on. We don’t do math every day, but when math is needed, we are there to answer the call.

Research + Analysis

Analysis is typically thought of as an essential skill of a data scientist, and we definitely do our fair share. These analyses range from customer specific reports to industry-wide analyses to analyses that inform a specific product build. Take, for example, the analysis Chris Breaux did on Dark Social traffic, or the countless studies Josh Schwartz, our Chief Data Scientist, has published on our blog and elsewhere. Or take, for instance, the research that Justin and Chris recently did towards updating our Engaged Time measurement methodology, the work Kris and I published on user journeys through websites, or the work Jeiran Jahani is doing to break new ground in topic detection. If there is a question that we can use our data to answer, we’ve likely been tasked with answering it. Sometimes our analyses take a few minutes; sometimes they take a few weeks. Sometimes we have to dig deep into our bag of tricks and pull out sophisticated statistical tools. Sometimes we write simple SQL queries to calculate averages.

User Interviews + Ethnographic Research

With our product designers and product managers, some of us on the data science team sit in on user interviews and do ethnographic research. This is not something that I’ve seen as common to data scientists at other organizations, but I think it is an incredibly important activity for a product data scientist to participate in.

I know a lot of data scientists and engineers who roll their eyes at this kind of stuff, but understanding user goals helps in the design of a data pipeline, the choice of an algorithm, or the decision for which metric is best for a given application. It makes you empathetic to your user base, which is never a useless endeavor. What product-centered data scientists do is try to keep in our heads at all times exactly what has to happen to create an amazing user experience.

“From the ugly, messy data at the start of the pipeline, to the user’s interaction with the tool, the user is interacting with data, and that has to be in our purview.”

These interviews also give context for where you can be lax with assumptions, because you often have to make trade-offs when you try to implement your fancy models. Sometimes all that great math adds one second to the response time of an API, and when you have traffic like ours, sometimes you can’t afford one second. Knowing the fidelity that your users expect or require helps solve this problem.

When we were redesigning the HUD, I sat in a variety of newsrooms with one of our designers and watched editors work. We simply watched them use our product in their day-to-day flow, and asked questions now and again about what they were doing. I also sat in a few user interviews during this time and have since sat in on countless others. Those experiences have influenced the engineering and data design of the HUD, as well as several other products I’ve helped build. And now, I can’t imagine being part of a product build without having done at least some user research.

Ideation + Future Products

Product-centered data science is not all about maintaining current systems or developing feature increments. There is also a large amount of long-term vision thinking. What will our products look like next year? In the the next five years? Often, our team will prototype systems to test feasibility of an idea or a product direction. We comb through the latest data science papers and computer science literature to see if any of the latest findings can be applied to future (or current) products. Once every six weeks, we set aside a week for our entire team to do data specific projects that aren’t directly connected to current projects. We’ve built some cool stuff — a system that scrapes and searches content across our network, a tool that discovers popular stories in the news, a deep recurrent neural net to predict traffic, a Slackbot recommendation engine — you name it.

Sales + Marketing

Not only do we help design and build the products, but we do what we can to help sell them, too. We’ll often pull customer-specific numbers, industry benchmarks, or even do full-on reports for the sales team to use on on sales calls. Sometimes we’ll even sit in on those calls and other client visits. We write blog posts and our Data Science Quarterly, which help out the marketing team grow our customer base. We write product white papers. We give interviews to reporters. Basically, we are tasked with speaking to whomever on behalf of Chartbeat Data.

Product-Centered Data Science

This is product-centered data science — at least here at Chartbeat. Personally, I think every product team should have a data scientist on it. Data science is about storytelling, and so are product design, sales, and marketing. There are so many intersections in thinking that it just seems natural for us to be involved in all these parts of the business. I might be in the minority, but for me, data science really has nothing to do with #BigData. It has nothing to do with machine learning. It might not have anything to do with statistics. It is about asking questions, developing user empathy, creating an experience, and telling a story. Our medium is data, our medium is code, but the outcome are fantastic product experiences.

We’re always looking for great storytellers: whether data scientists, account managers, or backend engineers. Come join us.

Many publishers would likely argue that the design of the website is as important for enticing readers to engage with the content as the content itself—humans, unfortunately, do judge books by their covers. The Guardian, The Atlantic, and The Wall Street Journal are just a few of the many publishers who have redesigned their websites this year.

We wondered if we could use our data to give insight into just how important web design is—a concept we call “data-driven web design.” Are there aspects of a page’s design that correlate to increased traffic, and even better, increased engagement?

Font sizes and colors, link sizes, link density, interaction, responsiveness: These are elements we can analyze for their ability to draw traffic to content and perhaps even contribute (along, of course, with the content itself) to keeping people there. Do people prefer to read articles surrounded by few links, large fonts, and bright colors? Or, are sparse, simple sites with undecorated text better? For those of us keen on data, could you use these attributes to predict how many people will be drawn to the content?

Understanding how page elements relate to click-throughs is by no means a new idea. For as long as Google AdSense has been around, there have been all kinds of smart people who’ve tried to figure out just how ad size relates to clickthrough-rates (CTR). But ads and articles are very different beasts. Do the same rules that hold true for ads hold true for articles? Does link size matter? Is it the only thing? Are there even any rules at all?

We here at Chartbeat like to focus on engagement, but as a first-pass, we wanted to examine how the almighty click-through relates to the size and distribution of links on a homepage. We examined a measure of click-through probability, the clicks per minute per active visitor (CPV). The data used in this analysis is the same which powers one of our most popular products, our Heads Up Display.

We looked at data from 294 publishing sites during several different times of day across several days to sample a variety of conditions. Much of what we found is not surprising—that is, people click where the design guides them to click. For instance, the majority of clicks happen at page depths of 400 to 600 pixels, where most main content links are located (Figure 1). The other most probable places for clicks are the locations of menus on left and right sides of the page. Nothing surprising here. As far as link sizes go, intuition holds as well: One would expect larger links—which likely represent headline articles—to drive greater traffic. This is certainly true. As a link’s area grows, generally so does the clicks per active visitor (Figure 2).



Larger links correlate with higher click-throughs, but what about link density? For sites with a lot of closely packed links, does this dilute click-through rates? After all, there are only so many concurrent users to split across content. As a proxy for density, we looked at the median distance between links on a site. The data shows that CPVs decrease approximately linearly for links a distance of 450 pixels apart to about 2,000 pixels apart. Sites having more closely spaced links perform about two and a half times better than sites with distant links. It seems users prefer denser sites (Figure 3).


These two pieces of evidence seem to contradict each other, though, because the distance between large links is necessarily large (assuming, of course, the links aren’t nested!). You might think, “Wait… if I have a lot of large links, I’ll have huge CPV, but they will be spaced far apart, so I’ll have a small CPV!” But, in reality, the data is only reflecting a common website design principle—a few large links interspersed with many smaller, closely spaced links.

In fact, if you ponder these data long enough, it seems that we run into a chicken-and-egg problem. Click-throughs force a tautology. Design forces people to click in certain places, so they do. And we measure this. See why engagement matters?

In any case, the data back up our intuition when it comes to determining how many people will click through to a given piece of content. Given a large enough dataset in which you know where a link is on a page, its height and width, how many people are on the page, and how many are currently engaged with content, you could likely obtain a reasonable prediction for the CPV. And perhaps using this knowledge, one might use such a model to guide the redesign of a website.

We decided to try this (not the site redesign part, the modeling part!). Simple statistical models we have recently built can predict CPV for a link to within 0.007 clicks per min per active visitor for 92% of links. This might seem impressive, but to get a foundation for what this means, only four websites in the set we analyzed have a median CPV greater than this. There is much more work to do until we can really answer the question if design can predict attraction to and engagement with content, but the way forward is promising. Colors, font sizes, responsiveness—the design space is large. These can draw people in, but ultimately, it is the content that will keep people there.

So, the next time you are thinking of undergoing an overhaul or redesign, stare closely at your Heads Up Display. Think about link size, link density, and ask yourself what you can do to draw people into that fabulous content.