Last year, we released the Chartbeat Publishing Video Dashboard, giving you real time insight into how your audience is engaging with your video content. As anticipated, we heard from many of you that you needed a deeper level of overall performance, across longer time frames to truly understand how your content is performing.
Want to see the Daily Video Perspective + Video Dashboard in action? We’re hosting a webinar-style walkthrough, October 1st at 1p EST. Register Now
Enter the Daily Video Perspective where we show your best performing videos from the previous day (or any day). Sort on time spent watching or video starts to see which videos were popular and for how long.
Under the heading Appears On, you’ll see exactly where, on your site, this particular video worked best. The upshot: video has to be in highly visible places (well-trafficked articles, for eg.) but the fit has to be right as well. If not, starts and engagement will suffer.
And keep your eye out for videos that still have life in them. What has the legs to carry over to today?
See top videos by starts and total engaged minutes: get video in front of more eyes and optimize clips for better attention.
Use the Summary tab to get an overview of the previous day’s total video performance stacked up against the past 30 days, in two categories: total engagement and total starts. You’ll also be able to see when your video audience was on your site with a 24-hour time series (blue line) compared to seven days ago (grey line).
Our goal with building out our Android and IOS SDKs for in-app tracking is to help you go beyond the (mobile) web to give you the whole story on your visitors and the way they interact with your content.
It's clear your mobile audience, especially in-app, needs to be represented in the dashboard. As we all know, mobile traffic is only growing (up to 80% year over year, with video being a strong driver), while the number of pure internet users has only experience a slight increase (less than 10% year over year).
Once you add our SDKs, these audiences will be visible directly in your dashboard. See what content they are consuming, engagement levels, visitor frequency, time of day trends, etc. - every dimension of audience identity and behavior that the Chartbeat dashboard supports.
Want access to the Video Dashboard? Interested in our mobile SDKs? Just want to share your thoughts and feedback? Great! We want to hear from you.
]]>This is the third post in a series about online advertising measurement and methodologies. Feel free to email me or post in the comments section about topics you’d like to see covered in this series. Curious about Chartbeat advertising tools? Learn more here.
In the last post, we took a look at what viewability means for publishers. Now, we’re turning the tables and breaking down what viewability means for advertisers.And with that new transparency come new terms and conditions. For example, we’re slowly starting to see agencies build viewability percentage goals or viewable impression guarantees as line items in their requests for proposals (RFPs). This means that prior to advertisers signing any contracts, media sellers will either agree to optimize an advertiser’s campaign toward a certain (goal) viewability percentage, or they will guarantee a mutually agreed upon viewability percentage. Sellers will be expected to deliver on said targets in order to receive payment in full.
On a larger scale, viewability is forcing marketers to rethink their approaches to media buying. For some, that may mean rolling out a campaign and then only paying for ads that were in view. For others it may mean doing a “controlled” buy beforehand to only serve ads that are in view. As we mentioned in the previous post, publishers and advertisers are still trying to find middle ground here. That said, larger companies like Google and Yahoo!, as well as a few smaller networks, are already allowing advertisers to pay only for viewable ads. Whether that will be the go-to model moving forward is still uncertain. We’ll have more on this topic later in our series.
By eliminating non-viewable ads from the picture, advertisers will be able to allocate more, if not all, of their campaigns to premium content sites. Let’s break it down: As I mentioned in my last post, viewability will allow sites that offer quality content and user experiences to stand out from those that do not. If we link viewability to page quality, publishers seeking to increase viewability will likely improve—if they haven’t already—the quality of their sites by making adjustments to page layouts, ad quantity and type. As site quality increases, engagement will likely increase as well, making certain pages and audiences become more valuable to advertisers. Conversely, the bad-actors (pages cluttered with ads, link-bait content) will find their inventory becoming less and less valuable to the market. Advertisers will be able to compare the performance of their campaigns across publishers, and will ultimately choose to buy ads on sites that offer both quality content and a quality audience.
In short, more transparency will lead to more informed ad buys, and theoretically, less wasted ad spend. When demand-side platforms (DSPs) optimize for viewable ads, thus taking non-viewable ads out of the equation, ad dollars will deliver more ROI.
First, there is the obstacle of an impression being viewable versus being actually viewed. If you’ll recall, the IAB defines a viewable impression as one that’s at least 50% visible in the viewable portion of a person’s browser window for at least one second. So, technically, viewability tracks if an ad has a chance to be seen by a user, not if it actually was. Yes, an ad may have been served, but that doesn’t mean it was seen. Think, for a second, how many times you’ve done a quick scan of a page and completely missed an ad.
Note: It’s important to remember that measuring viewability is no easy feat. And while viewability may have a number of blind spots, it’s better than anything we’ve had before. Short of something (pretty crazy) like eye-tracking, viewability does do a solid job of helping to call out the bad stuff, read: non-viewable ads, link-bait content and poor quality, ad-heavy user experiences.
Second, as with publishers, many of the challenges advertisers are facing are due to the binary natures of a viewable impression. Specifically, viewability on its own does not offer a comprehensive measurement of impact or brand lift. A simple “viewable or not” fails to fully measure the effectiveness of an impression, as it doesn’t offer a nuanced view of the extent to which an ad resonated with an audience. In other words, viewability does not take into account that the amount of time beyond one second in view improves response rates. We’ll take a deeper dive into Active Exposure Time—measuring the amount of time an actively engaged audience is exposed to a display ad—and how it impacts brand recall later in the series. If you’re dying to know more now, check out this post by our chief data scientist Josh Schwartz.
And finally, the discrepancies in measurements between viewability vendors will create a few obstacles for advertisers as well. Remember how we said publishers may not get paid for ads that one vendor says aren’t viewable, even when another gave the thumbs up? Well, the same goes for advertisers, except they may end up paying for ads that nobody ever sees. Sound familiar? This isn’t the first time advertisers are facing this problem. More on the nitty-gritty mechanics of viewability measurements to come. Stay tuned...
Sometime around 1935, the eminent statistician Ronald A. Fisher met a lady. The lady in question had a seemingly outrageous claim: She could tell, simply by taste, whether milk or tea was added to a cup first. But Fisher was skeptical, and, being the eminent statistician that he was, he developed a method by which to test her claim. He would present her with a series of eight cups; in half of those, milk would be placed in the cup first, and in the other half, tea would be placed in the cup first. Fisher would then hand the cups to the lady in random order and ask her which liquid was added first. If she performed better than chance, he'd believe her claim.
Fisher’s description of this account was one of the first applications of hypothesis testing, perhaps the most widely used—and arguably one of the most important—statistical concepts of the modern era. That it was focused on a simple two-outcome choice is not surprising. Many processes we run into every day can be modeled in this way, as a coin flip. Will the subway show up on time? Will someone click the link to this article? Will the Detroit Tigers or the Kansas City Royals win this game? ^{1}
These kind of problems—those in which you only have two outcomes—are known in statistics as Bernoulli processes. The main parameter governing these phenomena is the probability that a trial has succeeded. In Fisher’s example, this is the probability that the lady correctly identifies whether milk or tea is added first. For web traffic, this is the probability of clicking a link. In many of these types of two-outcome problems, you want to know how likely it is that you’ll observe some number of successes in a given number of trials. For example, you may be interested in the probability that 50 people will click on a link if 100 people see it. If you make an assumption that each event (i.e., each click) is independent of the previous event, the probability that you see some number of successes can be described by the binomial distribution. With a firm understanding of Bernoulli processes and the binomial distribution, you are equipped for modeling a whole host of binary-outcome problems.
A binomial distribution, however, isn’t incredibly useful if we don’t know the probability of success for a single trial. Honestly, this is what we’re typically interested in finding out, and it is really what Fisher tested: He assumed the probability of a lady guessing whether milk or tea was added first was pure chance (50/50), and developed a test to see if the data were consistent with the results of the experiment. But, in general, how do we determine what this probability is?
There are two ways we can estimate the probability from a set of trials. We could simply count the number of successes we’ve had and divide by the total number of trials. For instance, if we flipped a coin 10 times and it came up heads 3 of those times, we might guess that the coin is an unfair coin, landing on its head only 30% of the time. This is all well and good, but we only flipped the coin 10 times. How certain are we that the probability is actually 0.3? Perhaps it truly is a fair coin and our sample size was just too small.
Alternatively, we could assume that our probability of interest itself has some distribution. That is, perhaps we think that the probability is about 0.3, but we concede that it could be 0.1 or 0.5 or even 0.99999. Treating our parameter as a distribution is the heart of a technique known as Bayesian inference, which is based upon Bayes rule:
Don’t be intimidated by this equation—it is actually fairly intuitive. The left-hand side represents the answer to the question: Given the data we observed, how certain are we that our quantity-of-interest takes on a given value? This is called the posterior distribution. The right-hand side contains information about what we believe about the process we’re interested in. Prob(quantity-of-interest) is known as the prior distribution. This describes our initial beliefs about the quantity we’re trying to find out about; in this case, our probability of success in the Bernoulli trial. Prob(observation | quantity-of-interest) is called the likelihood. The likelihood describes what we believe the distribution of the data to be if we assume our quantity is a specific value. In our click-through/coin-flipping example, this is simply the binomial distribution. If we know the fairness of the coin p, then the probability we get M successes out of N flips follows a binomial distribution with parameters M and N. Then, a simple multiplication of our prior and our likelihood gives us our posterior. ^{2}
The above equation may not seem very impressive, but the real power of the Bayesian method comes in when we iteratively apply the equation to update our beliefs. That is, we can use a previously calculated posterior as a prior in a next round of calculation to update our posterior. If we do this enough times, we hope to decrease our uncertainty enough so that we can confidently determine what our “true” probability is. The neat thing is that if we choose our prior intelligently, we can get the math to work out so that updates are fairly easy.
That’s the math, but here is a concrete example. Consider an example website. Suppose we’re interested in the probability that a person will click on some link. If 10 visitors come to the page, and three of those people click on the link, we might guess that the click-through probability for that link is 3 /10 = 0.3 , but we wouldn’t be very certain; we only flipped the coin a small number of times. The far left panel on the figure below shows a prior we might build based on that uncertainty. It is peaked near 0.3, but is quite wide. ^{3}
Now suppose that we’ve waited long enough for many, many visitors. The two subsequent panels show how the distribution evolves as we gather more data. When we’ve seen 1000 visitors, we are pretty darn certain that the click-through probability is somewhere very close to 0.3. Now imagine what happens when we’ve seen 10,000 visitors!
Pretty amazing, right? If we gather data long enough, we can be incredibly certain about our click-through probability. In many cases, this is true. But let’s back up a bit.
In considering Bernoulli processes there is a fundamental underlying assumption that can often be overlooked. The assumption is this: The probability of success, p, is constant from trial to trial. For most phenomena, this is a reasonable assumption. But what if it is not? If the probability varies from trial to trial and this isn’t accounted for in our Bayesian updating, then we can end up becoming very certain about an incorrect probability. Consider the following example, where our probability varies smoothly between 0.3 and 0.6 over the course of 1,000 trials.
What happens when we do Bayesian updating with the same assumptions as above?
Not only does the peak of our posterior jump around wildly, depending on how many trials we do, but we start becoming incredibly certain that the probability is near the dead center of our varying probability function. I like to call this spurious certainty. We have an inaccurate model and too much data! We have become too certain in our beliefs.
This may seem like a contrived case, but in actuality, it is not. In fact, we’ve seen data here at Chartbeat to suggest that the probability to click on a link is time dependent. Take the figure below, which shows the average click probability for all links on an anonymous site’s homepage on a particular day.
The probability shows a 70% decrease from the beginning of the day to around 2 p.m., and then back up. In order to accurately depict the click-through behavior of this site's users, we have to take this variation into account to avoid spurious certainty.
How exactly can we take into consideration this time variation? We could add time directly into our Bayesian updates, but to get good data we might have to wait a long time. After all, in the general case we don’t really know what this variation looks like. Does our probability vary by time of day? Day of week? Month? Year? All of these? In reality, we probably don’t have enough time to gather enough data for our Bayesian updating to be very informative.
An alternative way is to forget about doing any sort of modeling and simply use measurements. In this method, we forget about priors and posteriors and likelihoods and just make a histogram of the data we’ve measured. We could, in effect, build an empirical form of the distributions from the figures above. Then we can update our beliefs by averaging the histogram of old data with the histogram of new data; we can even use a weighted average so anomalies will get “smoothed out.” We may not get a nice thin distribution, but at least we capture some of this temporal variation and we avoid spurious certainty. In fact, we've built our Heads Up Display, which measures click-through probabilities, to do exactly this.
In my opinion, we—and by we I mean humanity—should be ever the skeptics. In data analysis, this is paramount. Like Fisher, we should question outrageous claims and find ways to test them. We should revisit old assumptions, test them, and revisit them again. The data will guide the way, but we should always beware of spurious certainty.
Or, you know, you could always just flip a coin.
If you’d like to talk about his in more detail, perhaps over a cup of tea, contact me at dan@chartbeat or find me on Twitter.
^{1. The Tigers. Always bet on the Tigers.↩}
^{2. Ignoring, of course, the fraction’s denominator, but that is a bit out of the scope of this post... which is math speak for laziness.↩}
^{3. A note about how to read probability density functions if you are not familiar with them: Probability density functions (PDFs) are truly probability densities; that is, the area under the curve between two values on the x-axis gives the probability that our quantity-of-interest will be between those two points. That’s why the y-axis is so funny. To get the probability, we essentially need to multiply the y-axis value by distance between two values on the x-axis. If that doesn’t make any sense, just know that the best way to think about these distributions is to see where the curve is the most dense—where the greatest area under the curve is. So, the places where the peaks exist are the most probable values. I’ll blog more about distributions in the near future.↩}
]]>“If they click through to a link and then come straight back to Facebook, it suggests that they didn’t find something that they wanted. With this update we will start taking into account whether people tend to spend time away from Facebook after clicking a link, or whether they tend to come straight back to News Feed when we rank stories with links in them.”
Focusing on attention and time is nothing new for Facebook. On its last earnings call, Facebook specifically spoke about the size of their market opportunity in terms of the available time and attention they were able to accrue. On a more practical note, Facebook has been factoring how much time people spend away from Facebook after clicking on an ad into its pricing algorithm for some time now. In some ways, the news today is simply a wider application of that action.
Second, the decision to enable greater previewing of links, effectively giving the visitor more information to decide whether the content is interesting to them, potentially confirms a theory that Chartbeat’s data science team has held. On average, traffic from Facebook spends about 60% more time reading than traffic from Twitter. While there are likely a number of factors in this, the more sophisticated previewing in Facebook is a clear differentiator that we think affects this.
Take together these two actions confirm that Facebook is taking its users’ experience incredibly seriously and are leaning more and more on the fundamental concepts of the Attention Web to do so. That’s good news for quality publishers everywhere.
But what does this mean for great short-form content? The one potential challenge to this was raised by Matt Galligan of the excellent news service Circa:
@arctictony I feel like that's just such a short sighted way of thinking about it. @Circa's whole goal is brevity but we're not click bait.
— Matt Galligan (@mg) August 25, 2014
It’s utterly logical to be concerned that content designed for brevity would suffer under this algorithm. However, I think this underestimates the comparative wealth of attention that even content designed to be brief gets. The depressing truth of the Internet is that short-form content hangs out on the same end of the distribution curve of the Internet as long form when it comes to attention.
As I’ve mentioned elsewhere, the majority of pageviews on the internet get fewer than 15 seconds of engagement. Facebook is looking for those incidences when people come ‘straight back’ to the feed, suggesting that the threshold they’ve set for clickbait may be rather low. If your content matches the intent of your headline (ie. you’re selling what you’re promising), then you’re highly likely to beat Facebook’s threshold even with short form.
Bottom line: Focus on creating quality content, match it with an accurate headline, and you’ll be fine.
]]>WEBSITE.COM | |||
ARTICLE |
The first row contains information about all articles on website.com; the second row contains information about one page on the site. So, after looking at this data, I might come up with an insight like the following:
Even though “article” had below average engagement for “website.com,” readers shared this story 5 times more often than the typical story.
Let's break down where this insight came from. We see that “article” had five tweets, but without context, this does not tell us much. A great way to give context to a number is to compare it to a benchmark. For example, how does this number compare to the typical article on this website or the Internet as a whole? Put into the context of a larger picture, we can judge if a number is good or not. In this case, we are given all we need to know about Twitter shares across the site, so let's compare Twitter activity on “article” to the average article on “website.com.” However, since the overall site has much more traffic than “article,” comparing the number of tweets for each would be unfair. When comparing numbers, it is important to compare apples to apples. The standard way to deal with this is to normalize your values. In this case, we consider the tweet rate for both. That is the number of tweets per pageview:
Twitter share rate = number of tweets / pageviews
The table above then becomes:
WEBSITE.COM | ||
ARTICLE |
Now we are prepared to ask the following question: Was this page shared a lot? Or, how did the share rate of “article” compare to “website.com”? We answer:
“Article” was shared once per 10 pageviews, 5 times more than the typical article on “website.com."
This gives us an interesting one dimensional fact about the article. To get a more complete picture of how this article performed, however, it would be better to consider multiple data points. In this case, we also have access to information about how much time users spent engaging with content, so we can use that as well.
We ask an additional question: Was this story engaging? Or, how did the average engagement of “article” compare to the typical story on “website.com”? We answer:
Readers of “article” spent an average of 30 seconds actively reading content, which is less than the typical story on “website.com.”
As we ask and answer additional questions about the data for “article,” we start to get a more complete picture of the success of the story. In fact, if we combine this information, we start to build a story about our data, and in this case we will end up with something similar to what we stated above.
In summary, we performed a two-step process where we answered two questions:
Since both of these questions have two possible answers, yes or no, we have four total possible scenarios. This can be represented as a decision tree like the following:
For “article” we answered YES to question 1 and NO to question 2. This corresponds to the following path in our decision tree:
Repeating this procedure with another story, we might end up in a different branch of the tree. For example, consider the new data set:
WEBSITE.COM | ||
ARTICLE #1 | ||
ARTICLE #2 |
When we ask the same series of questions for “article #2”, we would follow this path:
And we could formulate a sentence like the following:
While “article #2” was shared less than the typical story, the content really captured its audiences’ attention with readers spending 32% more time engaged than the typical article.
In fact, we can create a different sentence for each of the four scenarios, so that no matter the situation we find ourselves in, we will have a sentence which describes the data in an interesting way. So, for a general article on “website.com” we could do the following:
Even though X had below average engagement, readers shared this story Y times more often than the typical story.
Even though “article” had below average engagement for “website.com,” readers shared this story 5 times more often than the typical story.
Now we have found a way to automatically generate a basic sentence about tweets and engagement on an article, but what more can we do to make this feel like real insights from a human data scientist?
Above we created one template sentence per branch of the decision tree. A simple trick we can play to give this process a more human touch is to increase the variety in the sentences that are created. For example, we could take the sentence:Even though “article” had below average engagement for “website.com,” readers shared this story 5 times more often than the typical story.
...and restate it as:
1 of every 10 people who read “article” tweeted about it, 500% more often than the average story on “website.com.” On the other hand, these readers only spent 30 seconds engaging with this content on average.
Rather than writing one template sentence per branch of the decision tree, we can create a collection of templates. This serves to create an illusion of a real data scientist telling you these facts, and will prevent the results from getting stale. We can also use additional data to include related data points. For example, in the case when the story is active on Twitter, we could enhance our original insight in the following way:
Even though “article” had below average engagement, readers shared this story 5 times more often than the typical story. In fact, the tweet from “user” generated 20 pageviews and 100 total seconds of engagement for this story.
Every time a question is asked in the decision tree, if there is additional data available, we can automatically add in extra information to flesh out the narrative.
While the example presented was very simple, there are endless possible data points to consider in general. We can extend the method described here to more complex scenarios where there are many more levels to the decision tree with multiple paths at each node.
This is the general framework:
This was constructed by following the red path through this decision tree in way that is very similar to the example we walked through above:
So, what do you think? We'd love to hear about your applications of this methodology.
]]>