Archive for the ‘Data Science’ Category

bookmarks-01
When you work with as much data as we do—and trust me, it’s a lot—it’s humbling to show off the actual journalistic output we support. So, we’ve compiled a list of the 20 stories that held your attention longest in 2015 — for a grand total of 685,231,333 Engaged Minutes (or more than 1,300 years). These were stories that held you breathless. Enraged you. Inspired you. They were long-form reports, rich with narrative, like #1, 7, 11, and 17, which show that readers really do respond to quality (!!). They were live coverages of the attacks in Paris (#3, 4, 6) or the elections in Britain (#5). They were confessional essays and impassioned arguments, investigations and elegies. These are the stories that prove that digital storytelling isn’t just alive, it’s kicking ass.

1. What ISIS Really Wants

The Atlantic | February

2. The Science of Why No One Agrees on the Color of This Dress

Wired | February

In-depth examinations of global newsmakers topped the list in 2015. Undoubtedly, this was the year of long-form narrative.

3. Paris attacks: as they happened

BBC | November

4. Paris attacks: Bataclan and other assaults leave many dead

BBC | November

5. Election Live

BBC | May

6. Paris massacre: At least 128 killed in gunfire and blasts, French officials say

CNN | November

It goes without saying: Breaking news will always grab and hold attention.

7. Inside Amazon: Wrestling Big Ideas in a Bruising Workplace

The New York Times | August

8. Scott Weiland’s Family: ‘Don’t Glorify This Tragedy’

Rolling Stone | December

9. How One Stupid Tweet Blew Up Justine Sacco’s Life

The New York Times | February

10. Police: Bryce Williams fatally shoots self after killing journalists on air

CNN | August

11. The Lonely Death of George Bell

The New York Times | October

Honed craft. Timeless themes. Notice that these Times pieces are even more examples of the power of narrative journalism.

12. Spygate to Deflategate: Inside what split the NFL and Patriots apart

ESPN | September

13. At least 14 people killed in shooting in San Bernardino; suspect identified

CNN | December

14. The “Food Babe” Blogger is Full of Shit

Gawker | April

15. I Found An iPhone On the Ground and What I Found In Its Photo Gallery Terrified Me

Thought Catalog | April

16. No. 37: Big Wedding or Small?

The New York Times | January

Sometimes, the most engaging content is the most distracting. Readers will engage deeply with more than just serious news items.

17. Split Image

ESPN | May

18. This is Why NFL Star Greg Hardy Was Arrested for Assaulting His Ex-Girlfriend

Deadspin | November

19. The Coddling of the American Mind

The Atlantic | September

20. The Joke About Mrs. Ben Carson’s Appearance Is No Laughing Matter

The Root | September

Want to see how your stories stack up? Get in touch.

Update: a reader wrote in with the great suggestion of examining the effect of direct quotations in headlines. We found that headlines with direct quotes are 14% more likely to win headline tests than average headlines, making them the second most effective headline style we’ve tested. Please comment or get in touch with other suggestions for headline styles to examine!

Writing a catchy headline that captures the attention of your audiences is, without question, an art form. As demonstrated in this headline, blindly following guidelines can lead to copy that sounds cliché at best, and actively off-putting at worst. Still, effective headline writing can make quite a difference in the success of your content — after all readers have to get to the actual articles somehow — so it can be expensive to get wrong.

Chartbeat Engaged Headline Testing enables content creators and editors to become better headline writers. By testing copy in real time, newsrooms can challenge assumptions about what kinds of headline constructions work well and which don’t.

Accordingly, we would like to turn that introspective lens on some of our own recommendations of how best to use our tool and then on some commonly cited “tips and tricks” for getting the most out of your headlines. As a foreword, while we have the luxury of being able to plot general trends in a rich dataset of over 100 publishers and almost 10,000 headline tests, each publisher and audience is different. We encourage you to take a look at your own data and put some of our findings to the test (literally!) to see what works best for you.

Verifying Best Practices for Engaged Headline Testing

To help our clients get started with our tool, we often give them a list of best practices. Here are a few examples:

  • Test in Higher Traffic Positions
  • Don’t be Afraid to Test Multiple Variants
  • Test Distinct Differences

We like to encourage users to conduct headline tests that converge to a winner quickly, so that winning headlines spend the most possible time with the largest possible audience.

This begs the question of what “converging to a winner quickly” means, and to answer it, I would like to appeal to our data for an overall view. The graph below shows a histogram of experiments by the number of headline trials — that is, the number of unique visitors that see one of the tested headlines:

graph_blog

About half of conclusive experiments (those that determine a winner) need fewer than 2,500 trials to converge. More than 85% need fewer than 10,000 trials. That said, identifying an average convergence time for your site will depend on the amount of traffic you have and how “evergreen” your content is.

For sake of example, let’s imagine a publisher that gets 100 trials per minute. They want to see their experiments finish within 25 minutes. The above statistics imply that only about half of this publisher’s experiments will finish before we reach 25 * 100 = 2,500 trials.

Want to maximize the ROI of your headline testing practice? Learn how.

Click-Through Rate
Now, let’s take a look at how we can leverage higher traffic (click-through rate) positions to optimize for convergence time. The following graph is a density plot of number of trials needed for convergence against the CTR of the winning headline:

EHT_Headline_Writing_Blog_-_Google_Docs

While there is a fair amount of noise in the plot, the main indication is that the needed number of trials is roughly inversely proportional to the CTR of the slot. So what does this mean in practice? If a publisher tests in a prominent headline position getting 8% CTR on the page, the test will converge in 4 times fewer trials than a position below the fold getting 2% CTR. That brings our convergence rate (within 25 minutes) from 50% to closer to 90%. Pretty astounding.


Number of Headline Variants
Finally, let’s graph the number of headline variants in each experiment:

graph

Right now, we see that more than two-thirds of our headline tests are basic A/B tests, meaning only 2 variants. There are clear pros and cons for testing additional headline options. On the negative side, you need to actually write more headlines, and I can sympathize with the creative burden. (Unfortunately, taking the lazy way out in tweaking a word or rearranging a sentence tends to have less impact than trying to highlight different viewpoints or angles.) Also, adding an additional (average) headline often will hurt convergence time, because you need additional trials to explore the added headline.

table_01-1

But, as demonstrated in the table above, there is clear benefit to testing additional headlines as well. The above table shows the amount by which the winning headline exceeds an average headline, by number of headlines tested. The winning headline in a five variant experiment typically has more than a 50% higher CTR than the average headline, whereas you may only see a 23% benefit for a standard A/B test. This pattern of increasing divergence of winner to mean follows directly from the variance in the CTR of each headline. Another consideration is how often the original headline (Variant A) ends up as the winning headline. Admittedly, the following result depends fairly strongly on how organizations decide to come up with headlines; but even in the A/B headline case, publishers have been fairly significantly rewarded for using the additional variant. In some extreme cases, we have seen publishers use as many as 17 (!) different variants in a single headline test, successfully converging in fewer than 10,000 trials (!!).

Testing the Efficacy of Common Headline Themes

We wanted to take a closer look at the characteristics that make up a good headline. Some of the essence of a great headline, such as Vincent A. Musetto’s “Headless Body in Topless Bar,” can never be fully captured in categorical variables; but there are common tropes that are commonly used to capture audience attention. With the help of headline guides, other headline studies, and raw expertise, we compiled a list of 12 commonly-cited themes:

  1. Does the headline contain a question?
  2. Does the headline have a number?
  3. Does the headline use adjectives?
  4. Does the headline use question words (e.g., ‘who’, ‘what’, ‘where’, ‘why’)?
  5. Does the headline use demonstrative adjectives (e.g., ‘this’, ‘these’, ‘that’, ‘those’)?
  6. Does the headline use articles (e.g., ‘a’, ‘an’, ‘the’)?
  7. Is the headline in the 90th percentile of length (73 characters or greater)?
  8. Is the headline in the 10th percentile of length (32 characters or fewer)?
  9. Does the headline contain the name of a person?
  10. Does the headline contain any named entity (e.g., person, place, organization)?
  11. Does the headline use positive superlatives (‘best’, ‘always’)?
  12. Does the headline use negative superlatives (‘worst’, ‘never’)?

For this exercise, Spacy.io was used for the natural language processing tasks, including entity recognition and part-of-speech tagging for English language sites.

There are a number of statistical challenges in trying to sort out what characteristics have real significance and which are spurious outliers. The first thing to note when making multiple significance tests is that it is important to control the familywise error rate, via Bonferroni correction, or else you greatly increase the likelihood of spurious results. The second thing is that there are a number of confounding variables to consider. Raw CTR is appealing for its simplicity, but it could very well be the case that short headlines, for instance, are much more likely to be tested in leaderboard spots at the top of busy homepages, so despite being inferior to other headlines in the same spot, the CTR ends up being higher. This is a form of Simpson’s Paradox.

We will look at two alternate metrics of headline success. The first is scaled CTR, where instead of comparing CTRs globally, we look at the ratios of CTR of a given headline to the CTR of the headline that won the experiment. With this metric, the average scaled CTR of a headline is close to 77% in this data set, so we use that 77% as a benchmark to see whether a particular property has a beneficial effect.

The second metric is winner propensity. We look at the set of experiments that compare headlines with a given property to a headline without and calculate how often we would expect headlines with that property to win, if winners for each experiment were chosen randomly. We then see whether the headlines of the given property are more likely to win.

table_v2

Results
The results were somewhat mixed. Only long headlines and headlines with demonstrative adjectives show significantly higher scaled CTR, and only headlines with demonstrative adjectives and numbers show higher propensity of being declared winner in a given headline test. The presence of articles actually significantly detracts from scaled CTR.

It’s worth discussing the one unambiguous result in a bit more detail. Demonstrative adjectives can actually be used in multiple ways in a headline. You can use them to create intrigue in clickbait-ish fashion: “These simple tricks will leave you speechless” or “You’ve never tasted anything like this.” There are also quite a few examples in our dataset of using demonstrative adjectives as a temporal specifier: “GOP Debate this evening,” for instance. In the future, as we collect more data, we can think about drilling down more granularly into specific constructions.

Perhaps more interesting than the positive results is the lack of significance among other factors that have been cited to be useful in capturing the attention of an audience. “Use terse, punchy headlines”; “Ask questions”; “Name drop.” None of these properties show much predictive power in the general case.

“That’s right, writers: We’ve proven that ‘5 Ways To Write The Best Headline Ever’ isn’t actually that effective.”

Final Thoughts
So where does that leave us? If you want to be an effective headline writer, maybe there is no substitute for creativity and attention. Watch for patterns in the headlines that end up floating to the top. Take the time to discuss what worked and what didn’t. Avoid the formulas and cliches. Be liberal with your use of headline testing, so that you can harness feedback from your readers in real time.

If there are any other ideas that you would like us to take a look at in the data, especially as our repository of tests grows, please don’t hesitate to reach out.

In the meantime, here’s a great resource for headline testing optimization.

If you had to describe five important events that were happening in the world right now, what would they be? How would you even go about answering that question?

To start, you might visit the homepage of your favorite news site, aggregator, or publisher. But just one site won’t have everything you’re looking for — maybe you want different takes on today’s news. What you might do is collate articles across several sites, see which news events multiple publishers are reporting on, and look at different perspectives on each story.

For our Hackweek project, backend engineer Anastasis Germanidis and I developed a process to identify these trending, important, global news events automatically and in real-time, using publicly available data. With a few machine learning algorithms, we can group articles across different sites by news event and output a list of important news events being reported right now, each represented by a set of articles providing different angles on the story.

I’ll first show our results, and then talk about the data science that makes this work. Below, I’ve run our data science pipeline on the home pages of major U.S. publishers, including the New York Times, the Washington Post, and Wall Street Journal, scraping data from the afternoon of October 13. To be clear, this pipeline does not use any data from Chartbeat’s analytics products – everything we use comes from a web scraper, which sees what any reader on the web would see.

Our project captures the important events of the day through algorithms and provide multiple articles for each new story.


Results: October 13, 2015

News Event 1: Violence in Israel


News Event 2: Kansas City Fire


News Event 3: Democratic Debate


So How Does it Work?

First, we need a dataset of articles to work with. We start by using PhantomJS, an open-source web scraper, to scrape the homepages of several major U.S. publishers including the New York Times, Washington Post, and Wall Street Journal. We want articles that homepage editors think are important to today’s news, so for each page, we look at all article links above the fold on a desktop screen and pick the top ten articles by link size.

We feed our article links to Python-Goose, a Python library which extracts the content of an article given its URL. Now we have the title, description, and content of ten articles on each homepage we started with.

We want to organize our dataset of scraped articles into news events. We start by preprocessing our article text with two steps: 1) named entity extraction and 2) tf-idf vectorization. Let me explain:

Named entity extraction

This involves identifying words or phrases that correspond to names of things. We use the MITIE python library, which identifies the names of people, organizations, and locations and classifies each entity it finds into one of these three categories. For our purposes, we’re less concerned with the classification of each named entity than the identification of these words and phrases. We extract all instances of named entities in each article to use for the next step of our pipeline.

Because news events almost always can be uniquely identified by the names of people, organizations, and locations involved, named entity extraction is an effective way of filtering out relatively unimportant terms while retaining important information — think of it as an extension of stop-word removal.

tf-idf Vectorizer

Scikit-learn’s tf-idf vectorizer transforms our list of named entities into a numerical vector for each article, which allows us to cluster articles with standard clustering algorithms. tf-idf stands for term frequency-inverse document frequency. In this case, term frequency is the number of times a named entity appears in an article divided by the number of total entities in the article. Document frequency is the fraction of articles in our dataset that contain a particular named entity. For a given entity and article, term frequency-inverse document frequency is simply the term frequency divided by the document frequency.

Roughly speaking, tf-idf gives a higher weight to entities that appear frequently in the article but less frequently in other articles.

Each dimension of an article’s tf-idf vector represents the tf-idf statistic for a particular word in our vocabulary. In this pipeline, our vocabulary contains all entities that have appeared at least once in our article dataset.

We cluster our tf-idf article vectors using an algorithm called spectral clustering, again using scikit-learn. Spectral clustering consists of three steps: first, we use the similarity of tf-idf vectors between pairs of articles to construct a similarity matrix of our data. We perform dimensionality reduction on this matrix using an eigenvalue decomposition, and finally use the k-means algorithm on this low-dimensional matrix to obtain our article clusters. We’ve found that for a dataset with 60 articles from six publishers, clustering into seven or eight groups works well.

Why didn’t we use a probabilistic topic model such as Latent Dirichlet Allocation? We found that topic models such as LDA give you clusters that roughly correspond to sections, such as technology, science, and politics, and not individual news events. This is perhaps because these algorithms allow for an article to belong to multiple topics instead of forcing a hard classification. This doesn’t make sense if topics are to correspond to news events – we know that an article will rarely report on more than one news story.

Here’s a diagram of our full pipeline.
diagram

What’s Next?

Recently, Twitter released a product called Moments, which organizes tweets into events using a team of human curators. We want to use our automated process to do the same with news articles, and we’re working towards a web application that displays our news events in real-time.

By using algorithms to evaluate the importance of news stories, we give you an easy way to figure out what’s happening in the world right now — without having to organize articles yourself or even wait for human curators.

My headphones are in, and I’m listening to Jóhann Jóhannsson’s The Miners’ Hymns — one of my favorite albums for coding. I’m finishing up an API for our new Heads Up Display (HUD), for which I’d worked out the math a few days earlier. I had spent the previous day figuring out how to implement the math and testing out edge cases with synthetic data, interspersed between product planning meetings and debugging a performance issue with a new component of the HUD backend. I’m about to put out a Pull Request, when I take a look at Nagios and notice that one of the systems that powers the current HUD has just gone critical. I start to debug, and a second later I get a Slack message from someone on Chartcorps saying that customers are starting to notice that the HUD is down. I see that it is a simple fix this time; I just have to restart one of services that powers the HUD.

Just in time, too, because I have to head uptown with members of our sales team to talk to one of our strategic clients about our new headline testing product. On my way out the door, one of our designers pulls me aside to look at the current designs for displaying the results of headline tests in the new HUD: “Does this viz accurately represent the data?” We talk for ten minutes, weighing pros and cons and looking at design alternatives. We talk about color schemes. We talk a bit about user interaction.

The meeting uptown goes wonderfully; I give a high-level overview of multi-arm bandit headline testing, answer some technical questions about the product, and get great feedback about the product to take back to the team. When I get back in the office, I see a message from Lauryn Bennett, our Head of Brand, asking if any of us on the Data Science team have time to answer a request from a journalist about a news event that just happened. This particular request doesn’t require an in-depth statistical analysis, so I write a quick script to pull the numbers. I spend a bit of time looking at the results and then write up a few paragraphs describing what I’ve found. I then head into a meeting with fellow engineers, designers, and product owners to plan our next Sprint.

This is my typical day.

Download now: Chartbeat Insider Guide: How to use Headline Testing to Hook and Hold Readers

#DataScience

According to the Harvard Business Review, data science is the sexiest job of the 21st century. If you have data, you need a data scientist to get value from it; data scientists are the only ones who can wrangle #BigData into submission. Apparently, data science will save us all.

I’ve read many pieces over the past year trying to describe what data science actually is. There’s usually some talk about math and programming, machine learning, and A/B testing. Essentially these pieces boil down to one observation: data scientists do something with data. #DeepLearning anyone? I’ve followed arguments on Twitter and blogs about who should and should not be considered a data scientist. Is Data Science even a new discipline? How does it differ from Statistics? Programming? Or is it this…

¯\_(ツ)_/¯

Ok, then, what the hell does a data scientist actually do?

Now this is a question I can answer. And since I haven’t read many concise descriptions of what data scientists do day-to-day, I figured that I’d throw my hat into the ring and talk about the kind of data science we do here at Chartbeat.
“WARNING: it may be a bit different than what you might have heard that data scientists typically do for a living.”

OK, so, what exactly do you do?

Our team here at Chartbeat are what I like to call Product-Centered Data Scientists — meaning the majority of things we do on a daily basis are in direct support of our products. Because we are a data company, our role is pretty central to the organization. Of course, we do math. We build data pipelines and write production code. We do all kinds of analyses. But we also work regularly with sales and marketing. We go on customer visits and help out with sales calls. We even participate in user research with our designers, UX, and product owners.

Engineering

As a tech company, we build software products. Plain and simple. As a data company, every one of those products has a data science need. Because of this, our team is embedded within the engineering team, and most of us take on heavy backend or front-end roles in putting code into production. We don’t just hand prototypes over to engineering for them to implement. We do the implementation. We tune our Redshift clusters, find API performance bottlenecks, choose the proper data structures. We are also part of the backend on-call rotation. If Chartbeat were to break at 2AM, we’d help fix it.

For example, just consider our Engaged Headline Testing tool. Andy Chen and Chris Breaux have been instrumental in designing, building, and maintaining the systems that power headline testing. Andy worked out the initial math for adding Engaged Time into the multi-arm bandit framework and was one of two people who built the initial backend. Chris Breaux has since taken over the Data Science role on the team and continues to push the math, and the product, to new places. The new features that will be released soon in that product are — in no uncertain terms — data science features.

In fact, all of us play central roles to each of the products with which we are associated. Josh Schwartz and Justin Mazur have built an enormous portion of our Ads Suite, Kris Harbold and Josh have built all of Report Builder, and Kris holds the distinction of being our only team member to have both front-end and backend code in production. Justin and I have worked on our Video Dashboard, and I’ve built a lot of the HUD. Each of us has contributed countless lines of code to all sorts of systems across Chartbeat.

“I don’t think it is an exaggeration for me to say that there is not a part of Chartbeat code that a data scientist has not touched.”

Math

Okay, so we do math. This just comes with the territory. Sometimes sophisticated math, sometimes rote math. This math is either in direct support of a product or is part of an analysis we’re working on. We don’t do math every day, but when math is needed, we are there to answer the call.

Research + Analysis

Analysis is typically thought of as an essential skill of a data scientist, and we definitely do our fair share. These analyses range from customer specific reports to industry-wide analyses to analyses that inform a specific product build. Take, for example, the analysis Chris Breaux did on Dark Social traffic, or the countless studies Josh Schwartz, our Chief Data Scientist, has published on our blog and elsewhere. Or take, for instance, the research that Justin and Chris recently did towards updating our Engaged Time measurement methodology, the work Kris and I published on user journeys through websites, or the work Jeiran Jahani is doing to break new ground in topic detection. If there is a question that we can use our data to answer, we’ve likely been tasked with answering it. Sometimes our analyses take a few minutes; sometimes they take a few weeks. Sometimes we have to dig deep into our bag of tricks and pull out sophisticated statistical tools. Sometimes we write simple SQL queries to calculate averages.

User Interviews + Ethnographic Research

With our product designers and product managers, some of us on the data science team sit in on user interviews and do ethnographic research. This is not something that I’ve seen as common to data scientists at other organizations, but I think it is an incredibly important activity for a product data scientist to participate in.

I know a lot of data scientists and engineers who roll their eyes at this kind of stuff, but understanding user goals helps in the design of a data pipeline, the choice of an algorithm, or the decision for which metric is best for a given application. It makes you empathetic to your user base, which is never a useless endeavor. What product-centered data scientists do is try to keep in our heads at all times exactly what has to happen to create an amazing user experience.

“From the ugly, messy data at the start of the pipeline, to the user’s interaction with the tool, the user is interacting with data, and that has to be in our purview.”

These interviews also give context for where you can be lax with assumptions, because you often have to make trade-offs when you try to implement your fancy models. Sometimes all that great math adds one second to the response time of an API, and when you have traffic like ours, sometimes you can’t afford one second. Knowing the fidelity that your users expect or require helps solve this problem.

When we were redesigning the HUD, I sat in a variety of newsrooms with one of our designers and watched editors work. We simply watched them use our product in their day-to-day flow, and asked questions now and again about what they were doing. I also sat in a few user interviews during this time and have since sat in on countless others. Those experiences have influenced the engineering and data design of the HUD, as well as several other products I’ve helped build. And now, I can’t imagine being part of a product build without having done at least some user research.

Ideation + Future Products

Product-centered data science is not all about maintaining current systems or developing feature increments. There is also a large amount of long-term vision thinking. What will our products look like next year? In the the next five years? Often, our team will prototype systems to test feasibility of an idea or a product direction. We comb through the latest data science papers and computer science literature to see if any of the latest findings can be applied to future (or current) products. Once every six weeks, we set aside a week for our entire team to do data specific projects that aren’t directly connected to current projects. We’ve built some cool stuff — a system that scrapes and searches content across our network, a tool that discovers popular stories in the news, a deep recurrent neural net to predict traffic, a Slackbot recommendation engine — you name it.

Sales + Marketing

Not only do we help design and build the products, but we do what we can to help sell them, too. We’ll often pull customer-specific numbers, industry benchmarks, or even do full-on reports for the sales team to use on on sales calls. Sometimes we’ll even sit in on those calls and other client visits. We write blog posts and our Data Science Quarterly, which help out the marketing team grow our customer base. We write product white papers. We give interviews to reporters. Basically, we are tasked with speaking to whomever on behalf of Chartbeat Data.

Product-Centered Data Science

This is product-centered data science — at least here at Chartbeat. Personally, I think every product team should have a data scientist on it. Data science is about storytelling, and so are product design, sales, and marketing. There are so many intersections in thinking that it just seems natural for us to be involved in all these parts of the business. I might be in the minority, but for me, data science really has nothing to do with #BigData. It has nothing to do with machine learning. It might not have anything to do with statistics. It is about asking questions, developing user empathy, creating an experience, and telling a story. Our medium is data, our medium is code, but the outcome are fantastic product experiences.

We’re always looking for great storytellers: whether data scientists, account managers, or backend engineers. Come join us.

You’ve got the quality content. You’ve got the attention of your audience. You’ve got the metrics to prove it. But how do you go about monetizing that attention?

One viable answer to that question lies in Engaged Ad Refresh. By loading new ads into positions that have already been actively exposed to readers for a significant amount of time, Engaged Ad Refresh creates additional premium ad inventory to sell to advertisers. In other words: high quality content rewarded by commensurate ad revenue.

Read the rest of this entry »