Archive for June, 2014


After two weeks of intense international web engagement, our bracket for the Attention Web World Cup is set. Many of the groups came down to the very last game, and if you’ve been following along, you witnessed the excitement of Honduras narrowly edging out Switzerland by one second and the USA keeping their two-second halftime lead to defeat Germany and advance to the knockout stage of the Attention Web World Cup.

The first round looks to have some really exciting matchups, if the scores from the Group Round any indicator. Nigeria, however, appears to be the clear favorite going into the next round.

There will be a few small rule changes for the Attention Web World Cup from here on out. First, there will be no draws allowed, so we’re throwing statistical significance out the door and determining the winner only by the team with the highest median Engaged Time. Second, although the teams in the #AWWC are different than in the real World Cup, each match will still be played at the same time as the corresponding match.

Keep on checking back for bracket updates and blog posts. Boa Sorte e divirta-se!


If you’re anything like me, you’re still in a zombie-like stupor over the last 30 seconds of Sunday night’s USA-Portugal game. It’s astounding how drastically hopes, dreams, and expectations can change in a mere half minute.

Wait, we’re talking about the Attention Web World Cup, right? Well, in that game, the U.S. was triumphant in defeating Portugal 25 – 22, knocking Portugal out of the next round.

For those of you anxiously anticipating which teams are moving on in the AWWC, or how your favorite country is faring, you can see all match results and standings in this sheet.

If you look at the results of each match, you’ll likely notice a few interesting points. First, engagement for any particular country seems fairly consistent, although we only have two sample points per team to really make this a solid statement. More interestingly, the vast majority of countries have had very similar engagement times during their respective World Cup games—the exception being Nigeria, who has been totally smoking their opponents. (Their time differential through two matches is a whopping +16.8 seconds.)

Going into the AWWC, I wasn’t entirely sure if we’d see any drastic country-by-country differences in engagement. This is why I thought the Attention Web World Cup would be pretty interesting to play out. Any team could win any given match. We’d be at the whim of sampling variance, and we’d be able to hopefully gain some insight into international variability of Engaged Times.

There may not be any large noticeable differences across countries, and there are likely a lot of not-very-interesting reasons for this. In fact, I’m not entirely sure this is a good question to ask. I mean, why would there be drastic differences, right? But in discussions with fellow Data Science team members here at Chartbeat, we wondered that if there were any differences in how users from different countries engage with content, it may come from grouping by the language of the website rather than the country from which the user accesses a site. After all, we live in a global society, and perhaps people tend to gravitate towards content written in their primary language.

To get a feel for this, I looked at the Engaged Times I gathered for all games so far (as of Monday morning), but broken out by the language of the website. Below is what these distributions look like. The data are sorted by median Engaged Time.


A note about this plot:

This is a type of statistics visualization known as a box plot. Disclaimer: I’m a big fan of box plots. It gives us a feel for the distribution of data. The left end of each box represents the 25th percentile of the data, the right end of each box represents the 75th percentile, and the heavy black vertical line represents the median. For example, for Chinese language websites, the 25th percentile is 14 seconds, the median is 33 seconds, and the 75th percentile is 41 seconds. The thin lines at either end of the box—known as the whiskers—extend to the minimum and maximum data points measured.

We do see slight differences in engagement by language, but they are not overwhelming. The distributions overlap quite a bit. That said, there is a 15-second difference between the median Engaged Time of the largest median Engaged Time (Chinese language sites) and the smallest median Engaged Time (Arabic language sites). Western European languages tend to have nearly identical engagement. If there are any similarities, content written in Eastern languages like Chinese and Japanese do appear to gain engagement more than Western languages, but I’m no expert on languages, so I’m hesitant to conjecture further. And, let’s be honest here, I haven’t done a rigorous analysis—it has so far just been a fun exploration.

On Friday, we’ll be announcing our lineups for the knock-out stage. As always, I’d love to hear your thoughts, so feel free to tweet them @dpvalente using the hashtag #AWWC.

On March 31, the Media Rating Council (MRC) announced it was lifting its advisory on viewable impressions for display advertising, bringing the industry one step closer to transacting on viewability for the first time. The point at which publishers are asked to deliver highly viewable campaigns is rapidly approaching. If you haven’t started to develop a strategy to maximize the viewability of your ads, I’d wager that in the next three months, you will.

There are many tactics that can be applied to improve your ads’ view ability: ensuring fast ad loads; lazy-loading advertisements; and redesigning a website to feature always in-view units.

One issue has gotten surprisingly little discussion, though: Ads are much more viewable on pages that people actually want to read. Take a look at the following figure, which was computed across a sample of a billion ad impressions across the month of May 2014.

Screen Shot 2014-06-19 at 12.00.22 PM

We see there’s a strong relationship between what fraction of ads are seen and how long a person spends reading the page: as Engaged Time increases from 15 seconds to one minute, viewability goes up by over half, from 37% to 57%. Visitors who read for more than 75 seconds see more than 60% of advertisements.

This isn’t too surprising. Of course, people who read pages more deeply see more of the ads on the page, but it’s still worth taking note. We’ve argued for years that articles with higher average Engaged Time should be promoted because they represent the articles your audience is most interested in, and—in the days where viewability is more critical than ever—promoting your most deeply read articles makes good business sense, too.

Want more? Download the Chartbeat Quarterly.


Last week was the start of the World Cup, which meant the kickoff of Chartbeat’s Attention Web World Cup. We’re just 11 matches in and we’ve already seen some pretty awesome games. Some of my favorites include: The Netherlands soundly defeating the defending champs; Costa Rica surprising Uruguay with a 3-1 upset; and Switzerland scoring in the 93rd minute to defeat Ecuador.

But, for those of you who were disappointed in the performance of your teams over the weekend, here’s your chance for redemption. Below are the scores for how the teams fared this weekend in the Attention Web World Cup, and they are quite different than the outcome of the “real” World Cup.

Engagement between countries is very similar … this is truly anyone’s cup!

(Winning score highlighted  in green, draws in yellow.)




Wait, how does a draw work in the AWWC?

Many of you will notice that in some games a two-second differential, for example, will result in a win for one of the teams, yet in another game, a two-second differential will result in a draw. Take, for example, the Cote d’Ivoire/Japan matchup. Japan had a median Engaged Time of 26.0 seconds, and Cote d’Ivoire had only 20.0 seconds. A six-second differential, but we had a draw? What’s with that?

As I said in the last post, I determine the winner in a statistical manner. Over the course of the game, I sample Engaged Time for users from each country for the top 20 articles on each of Chartbeat’s sites. This results in a distribution of times for each team. To determine a winner, I ask, statistically, whether these two distributions are different. In other words, I try to determine that if I had a large enough sample of Engaged Times for each country, would it turn out that one country consistently had a larger median Engaged Time? The problem—and this is a fundamental concept in statistics—is that the size of a sample is directly related to the precision with which you can judge your statistic of interest. In our case, this amounts to the fact that the more data we have, the narrower the margin can be for us to determine a winner.

And here’s the rub: For countries like Cote d’Ivoire and Japan, we didn’t have many samples to look at. With these distributions, there is too much variability in the data for us to precisely determine whether the 26-second median we measured for Japan is, in actuality, truly larger than the 20-second median we measured for Cote d’Ivoire. We just can’t know if Japan had such a large median only because of the particular sample we drew in comparison to Cote d’Ivoire’s sample.

In this way, the Attention Web World Cup is quite democratic. Those countries whose web presence across our sites isn’t very large don’t automatically get relegated to the bottom of the heap, they have a good chance at getting 1 point through a draw.

Keep checking back for updates and tweet about your favorites using #AWWC.

Boa Sorte e Divirta-se!


The World Cup kicked off yesterday with a fantastic (albeit controversial) game between Brazil and Croatia, with Brazil winning 3 – 1. Over the next month, we soccer fans will be glued to our screens—tablets, phones, TVs, you name it. We’ll be watching games, replaying highlights, reading articles, checking stats, tweeting, messaging our friends. The World Cup will consume our lives.

As a soccer fan, I started worrying that this would result in a significant drop in my productivity, which, you know, wouldn’t be that great since we here at Chartbeat are constantly working to deliver fantastic products and provide key data insights. I got to thinking about how many of our customers will be putting out amazing content covering the World Cup, how I will likely spend a lot of time reading this content, and how I’ll be doing my best to ensure that our data will help you deliver your content as effectively as possible. This is when an idea struck me.

Chartbeat should hop on the World Cup bandwagon, and use our data to say something insightful about engagement during the games. Those of you who stare at the Chartbeat Publishing Dashboard all day know that content is consumed differently by visitors from different countries, and as a Chartbeat Data Scientist, I started wondering exactly how engagement with content varied across country. Could I do an analysis to provide some interesting, useful insights into how users from different locations consumed content? And then, I said to myself: Bah! Let’s just have some fun.

So, Chartbeat would like to introduce the Attention Web World Cup, a friendly international competition to see which country has the most engagement with content. We’ll be pitting each country in the World Cup against each other as they consume content on the web during World Cup matches.

How does a match in the AWWC work?

At the same time that a World Cup match is being played, we sample engagement across all of Chartbeat’s sites, filtered by users from the each of the countries that are playing in the current match. About every five minutes during the game, I take the top twenty articles on each of our domains as judged by the number of concurrent users on that article, and then I grab the average engagement time of each of those articles. This is done for each country separately. I calculate a score at the end of 90 minutes by looking at the median of these engagement times. The country with the highest median Engaged Time is the winner.

I thought I’d be fair about this, so I score in a statistical manner, since there is a distribution of times. A draw is possible, if the medians aren’t statistically significant. How did we choose this scoring scheme? Did we have a big meeting here in the Hall of Justice at Chartbeat Studios to decide on the rules of the game? Were there knock-down, drag-out arguments between members of our data science team? No. I just arbitrarily decided this one day last week and stuck with it. So, yeah, there’s that.

So how did the first match of the #AWWC play out?

Match 1: Brazil v. Croatia

Screen Shot 2014-06-13 at 1.09.25 PM

And Croatia is the winner! It was quite a close game, but even at the half, Croatia was ahead. Looking at the distribution of Engaged Times shows that engagement was, in fact, quite similar between the two countries. See below: Brazil is in yellow, Croatia is in red.


A larger proportion of Brazilians spent little time engaged on content, and a small percentage of Croatians spent a large amount of time engaged with content. These were the superstars for the Croatian team in the first match of the Attention Web World Cup, and pushed their team to victory.

We can conjecture as to why this is so: Perhaps fans of the Croatian team were scouring sites on the net to find explanations for why those two calls (the penalty and the no-goal) went the way they did, or perhaps Croatians were re-reading all the articles that said Croatia had absolutely no chance of beating Brazil. Were the Brazilian fans more focused on the game, and less on their “second screens?”  Whatever the reason, Croatian fans can find some solace in the fact that, at least in the Attention Web World Cup, they came out ahead.

I’m going to be doing this for every match in the Group stage of the World Cup, followed by the tournament style bracket, so keep on checking our blog over the next month for updates on how your favorite team fares. You can tweet me @dpvalente for further discussion about the scoring scheme or anything else related to data/analysis.