Archive for the ‘Data Science’ Category

Revisiting Return Rates

July 14th, 2014 by Josh

Starting today, we’ve updated our definition of return rate in both our Weekly Perspectives and in the Chartbeat Publishing dashboard. Consequently, you’re likely to see a shift in the numbers in your dashboard — so we wanted to write a quick note explaining the change, why we made it, and what you can expect to see.

Defining return rate

Return rate, if you’re not familiar with it, is a metric designed to capture the quality of traffic that typically comes from a referrer. It measures the fraction of visitors coming from a given referrer who return to a site later — if 1,000 people come to a site from, say, Facebook, should we expect 10 of them to come back or 500? Depending on the answer, we might interpret and respond to a spike from Facebook quite differently. While the intuition behind return rate is straightforward, the actual formula used to calculate it is a bit more up for grabs. Up until now, we’ve calculated return rates using the following formula: CodeCogsEqn (3) That formula roughly captures a notion of “how likely is it, for a given visit from Facebook, that that visit will be ‘converted’ into a return?”   As we’ve talked through that definition over the past year, we’ve come to realize that it’s more natural to phrase returns in terms of people, not visits — to ask “how likely is it, for a given visitor from Facebook, that that person will be ‘converted’ into a return?” Hence, we’re now using the following calculation: CodeCogsEqn (4) So, rather than speaking in units of “visits,” this definition speaks in units of “visitors” — a seemingly small (but significant) change. In addition, we’re now only counting a return if it’s at least an hour after the initial entrance, which corrects for a pattern we sometimes see where visitors enter a site and then re-enter a few minutes later.    

What's changing?

It’s likely that the return rate numbers in your dashboard and Weekly Perspectives will drop under this new definition. To help you sort out whether your numbers are trending up or down, we’ve gone back and recalculated reports using the new methodology, going back to the beginning of June. We hope that the transition to the new definition is painless, but if you have any questions, feel free to comment or get in touch with me at

Attention Web World Cup: Follow Along with Our Bracket for the Round of 16

June 27th, 2014 by Dan


After two weeks of intense international web engagement, our bracket for the Attention Web World Cup is set. Many of the groups came down to the very last game, and if you’ve been following along, you witnessed the excitement of Honduras narrowly edging out Switzerland by one second and the USA keeping their two-second halftime lead to defeat Germany and advance to the knockout stage of the Attention Web World Cup.

The first round looks to have some really exciting matchups, if the scores from the Group Round any indicator. Nigeria, however, appears to be the clear favorite going into the next round.

There will be a few small rule changes for the Attention Web World Cup from here on out. First, there will be no draws allowed, so we’re throwing statistical significance out the door and determining the winner only by the team with the highest median Engaged Time. Second, although the teams in the #AWWC are different than in the real World Cup, each match will still be played at the same time as the corresponding match.

Keep on checking back for bracket updates and blog posts. Boa Sorte e divirta-se!

Attention Web World Cup: The Final Week of the Group Stage

June 23rd, 2014 by Dan


If you're anything like me, you're still in a zombie-like stupor over the last 30 seconds of Sunday night's USA-Portugal game. It's astounding how drastically hopes, dreams, and expectations can change in a mere half minute.

Wait, we're talking about the Attention Web World Cup, right? Well, in that game, the U.S. was triumphant in defeating Portugal 25 - 22, knocking Portugal out of the next round.

For those of you anxiously anticipating which teams are moving on in the AWWC, or how your favorite country is faring, you can see all match results and standings in this sheet.

If you look at the results of each match, you'll likely notice a few interesting points. First, engagement for any particular country seems fairly consistent, although we only have two sample points per team to really make this a solid statement. More interestingly, the vast majority of countries have had very similar engagement times during their respective World Cup games—the exception being Nigeria, who has been totally smoking their opponents. (Their time differential through two matches is a whopping +16.8 seconds.)

Going into the AWWC, I wasn’t entirely sure if we’d see any drastic country-by-country differences in engagement. This is why I thought the Attention Web World Cup would be pretty interesting to play out. Any team could win any given match. We’d be at the whim of sampling variance, and we’d be able to hopefully gain some insight into international variability of Engaged Times.

There may not be any large noticeable differences across countries, and there are likely a lot of not-very-interesting reasons for this. In fact, I’m not entirely sure this is a good question to ask. I mean, why would there be drastic differences, right? But in discussions with fellow Data Science team members here at Chartbeat, we wondered that if there were any differences in how users from different countries engage with content, it may come from grouping by the language of the website rather than the country from which the user accesses a site. After all, we live in a global society, and perhaps people tend to gravitate towards content written in their primary language.

To get a feel for this, I looked at the Engaged Times I gathered for all games so far (as of Monday morning), but broken out by the language of the website. Below is what these distributions look like. The data are sorted by median Engaged Time.


A note about this plot:

This is a type of statistics visualization known as a box plot. Disclaimer: I’m a big fan of box plots. It gives us a feel for the distribution of data. The left end of each box represents the 25th percentile of the data, the right end of each box represents the 75th percentile, and the heavy black vertical line represents the median. For example, for Chinese language websites, the 25th percentile is 14 seconds, the median is 33 seconds, and the 75th percentile is 41 seconds. The thin lines at either end of the box—known as the whiskers—extend to the minimum and maximum data points measured.

We do see slight differences in engagement by language, but they are not overwhelming. The distributions overlap quite a bit. That said, there is a 15-second difference between the median Engaged Time of the largest median Engaged Time (Chinese language sites) and the smallest median Engaged Time (Arabic language sites). Western European languages tend to have nearly identical engagement. If there are any similarities, content written in Eastern languages like Chinese and Japanese do appear to gain engagement more than Western languages, but I'm no expert on languages, so I’m hesitant to conjecture further. And, let’s be honest here, I haven't done a rigorous analysis—it has so far just been a fun exploration.

On Friday, we’ll be announcing our lineups for the knock-out stage. As always, I’d love to hear your thoughts, so feel free to tweet them @dpvalente using the hashtag #AWWC.

On Engagement & Viewability: Why Quality Content Makes Good Business Sense

June 19th, 2014 by Josh

On March 31, the Media Rating Council (MRC) announced it was lifting its advisory on viewable impressions for display advertising, bringing the industry one step closer to transacting on viewability for the first time. The point at which publishers are asked to deliver highly viewable campaigns is rapidly approaching. If you haven’t started to develop a strategy to maximize the viewability of your ads, I’d wager that in the next three months, you will.

There are many tactics that can be applied to improve your ads' view ability: ensuring fast ad loads; lazy-loading advertisements; and redesigning a website to feature always in-view units.

One issue has gotten surprisingly little discussion, though: Ads are much more viewable on pages that people actually want to read. Take a look at the following figure, which was computed across a sample of a billion ad impressions across the month of May 2014.

Screen Shot 2014-06-19 at 12.00.22 PM

We see there’s a strong relationship between what fraction of ads are seen and how long a person spends reading the page: as Engaged Time increases from 15 seconds to one minute, viewability goes up by over half, from 37% to 57%. Visitors who read for more than 75 seconds see more than 60% of advertisements.

This isn’t too surprising. Of course, people who read pages more deeply see more of the ads on the page, but it’s still worth taking note. We’ve argued for years that articles with higher average Engaged Time should be promoted because they represent the articles your audience is most interested in, and—in the days where viewability is more critical than ever—promoting your most deeply read articles makes good business sense, too.

Want more? Download the Chartbeat Quarterly.

Attention Web World Cup: Weekend 1 Update

June 16th, 2014 by Dan


Last week was the start of the World Cup, which meant the kickoff of Chartbeat’s Attention Web World Cup. We’re just 11 matches in and we’ve already seen some pretty awesome games. Some of my favorites include: The Netherlands soundly defeating the defending champs; Costa Rica surprising Uruguay with a 3-1 upset; and Switzerland scoring in the 93rd minute to defeat Ecuador.

But, for those of you who were disappointed in the performance of your teams over the weekend, here’s your chance for redemption. Below are the scores for how the teams fared this weekend in the Attention Web World Cup, and they are quite different than the outcome of the “real” World Cup.

Engagement between countries is very similar … this is truly anyone’s cup!

(Winning score highlighted  in green, draws in yellow.)




Wait, how does a draw work in the AWWC?

Many of you will notice that in some games a two-second differential, for example, will result in a win for one of the teams, yet in another game, a two-second differential will result in a draw. Take, for example, the Cote d’Ivoire/Japan matchup. Japan had a median Engaged Time of 26.0 seconds, and Cote d’Ivoire had only 20.0 seconds. A six-second differential, but we had a draw? What’s with that?

As I said in the last post, I determine the winner in a statistical manner. Over the course of the game, I sample Engaged Time for users from each country for the top 20 articles on each of Chartbeat’s sites. This results in a distribution of times for each team. To determine a winner, I ask, statistically, whether these two distributions are different. In other words, I try to determine that if I had a large enough sample of Engaged Times for each country, would it turn out that one country consistently had a larger median Engaged Time? The problem—and this is a fundamental concept in statistics—is that the size of a sample is directly related to the precision with which you can judge your statistic of interest. In our case, this amounts to the fact that the more data we have, the narrower the margin can be for us to determine a winner.

And here’s the rub: For countries like Cote d’Ivoire and Japan, we didn’t have many samples to look at. With these distributions, there is too much variability in the data for us to precisely determine whether the 26-second median we measured for Japan is, in actuality, truly larger than the 20-second median we measured for Cote d’Ivoire. We just can’t know if Japan had such a large median only because of the particular sample we drew in comparison to Cote d’Ivoire’s sample.

In this way, the Attention Web World Cup is quite democratic. Those countries whose web presence across our sites isn’t very large don’t automatically get relegated to the bottom of the heap, they have a good chance at getting 1 point through a draw.

Keep checking back for updates and tweet about your favorites using #AWWC.

Boa Sorte e Divirta-se!