After a month of exciting matches, the Attention Web World Cup has come to a close. In a time-honored tradition (pun intended) Ghana defeated the US with a score of 30 to 25. Congratulations to everyone from Ghana who was consuming content on the web during World Cup matches; you all contributed to this amazing achievement! And to my fellow Americans: next time around, let’s spend more time reading, okay?

To wrap up the festivities, one of our designers made these awesome animations of the time course of each tournament game based on the data I pulled. These plots show the median Engaged Time for users from each country as each match progresses.

When you view these animations, you’ll likely notice that some of these countries have incredibly stable Engaged Times while others have Engaged Times that are incredibly erratic. The U.S., for instance shows a very small amount of variance in median Engaged Time, while Cote d’Ivoire and Cameroon have median Engaged Times that jump all over the place.

This behavior is a consequence of sample size. At any particular time during a match, users from many of the African countries and other smaller countries were a much smaller sample size than, say, users from the US or Australia. In statistics and data analysis, we’re always concerned about sample size for exactly the reason illustrated in many of these graphs. The variability in the sampled statistic can mask the “true” value. We can try to capture this with a distribution, but if the width of that distribution is large, then we can’t be very confident in the value of whatever measure of central tendency we choose (mean, median, mode, etc.). And sample variance depends on the inverse of the sample size, so only as the number of points we’ve sampled gets large do we have a hope that the confidence in our estimate will rise.

I’m actually quite surprised the U.S. made it so far in my scoring scheme here. I knew going into the #AWWC that some countries were sorely underrepresented in our sample. I expected a fair chance that these countries would show a falsely high median Engaged Time. If enough of the small sample of users just so happened to be long-engagement users, this would skew their results. In the Group Round this was okay, because I performed a statistical test that tried to account for this variability. There, I asked a very common statistical question: Assuming these two teams actually have the same median Engaged Time, what is the probability that I’d observe a difference in medians at least as extreme as the one I’ve observed? If that probability was low enough, then I declared Team A and Team B to have different medians, and took the higher one as the winner. But in the bracket round, we needed clear winners (no draws were allowed), so we left it up to sampling variance. For the small-sample-size teams, this was a double edged sword. They only needed a few users spending an inordinate time engaged with content to edge above the higher-sample-size teams. But, conversely, if the users they had spent very short times, that would skew towards losing. We can see, though, that this seemed to work out well for these counties—they made a great showing all the way through the AWWC.

Thinking about variability is my job, so I might be biased here (yes, a statistics pun), but I hope you enjoyed this fun exploration of our data. I hope it got you thinking about international variability in engagement, and variability of metrics in general. Tweet me @dpvalente or email me at dan@chartbeat if you want to continue the discussion.