Attention Web World Cup: The Final Week of the Group Stage

June 23rd, 2014 by Dan

AWWC

If you're anything like me, you're still in a zombie-like stupor over the last 30 seconds of Sunday night's USA-Portugal game. It's astounding how drastically hopes, dreams, and expectations can change in a mere half minute.

Wait, we're talking about the Attention Web World Cup, right? Well, in that game, the U.S. was triumphant in defeating Portugal 25 - 22, knocking Portugal out of the next round.

For those of you anxiously anticipating which teams are moving on in the AWWC, or how your favorite country is faring, you can see all match results and standings in this sheet.

If you look at the results of each match, you'll likely notice a few interesting points. First, engagement for any particular country seems fairly consistent, although we only have two sample points per team to really make this a solid statement. More interestingly, the vast majority of countries have had very similar engagement times during their respective World Cup games—the exception being Nigeria, who has been totally smoking their opponents. (Their time differential through two matches is a whopping +16.8 seconds.)

Going into the AWWC, I wasn’t entirely sure if we’d see any drastic country-by-country differences in engagement. This is why I thought the Attention Web World Cup would be pretty interesting to play out. Any team could win any given match. We’d be at the whim of sampling variance, and we’d be able to hopefully gain some insight into international variability of Engaged Times.

There may not be any large noticeable differences across countries, and there are likely a lot of not-very-interesting reasons for this. In fact, I’m not entirely sure this is a good question to ask. I mean, why would there be drastic differences, right? But in discussions with fellow Data Science team members here at Chartbeat, we wondered that if there were any differences in how users from different countries engage with content, it may come from grouping by the language of the website rather than the country from which the user accesses a site. After all, we live in a global society, and perhaps people tend to gravitate towards content written in their primary language.

To get a feel for this, I looked at the Engaged Times I gathered for all games so far (as of Monday morning), but broken out by the language of the website. Below is what these distributions look like. The data are sorted by median Engaged Time.

engagement_by_language_2

A note about this plot:

This is a type of statistics visualization known as a box plot. Disclaimer: I’m a big fan of box plots. It gives us a feel for the distribution of data. The left end of each box represents the 25th percentile of the data, the right end of each box represents the 75th percentile, and the heavy black vertical line represents the median. For example, for Chinese language websites, the 25th percentile is 14 seconds, the median is 33 seconds, and the 75th percentile is 41 seconds. The thin lines at either end of the box—known as the whiskers—extend to the minimum and maximum data points measured.

We do see slight differences in engagement by language, but they are not overwhelming. The distributions overlap quite a bit. That said, there is a 15-second difference between the median Engaged Time of the largest median Engaged Time (Chinese language sites) and the smallest median Engaged Time (Arabic language sites). Western European languages tend to have nearly identical engagement. If there are any similarities, content written in Eastern languages like Chinese and Japanese do appear to gain engagement more than Western languages, but I'm no expert on languages, so I’m hesitant to conjecture further. And, let’s be honest here, I haven't done a rigorous analysis—it has so far just been a fun exploration.

On Friday, we’ll be announcing our lineups for the knock-out stage. As always, I’d love to hear your thoughts, so feel free to tweet them @dpvalente using the hashtag #AWWC.