Archive for the ‘Life at Chartbeat’ Category

My headphones are in, and I’m listening to Jóhann Jóhannsson’s The Miners’ Hymns — one of my favorite albums for coding. I’m finishing up an API for our new Heads Up Display (HUD), for which I’d worked out the math a few days earlier. I had spent the previous day figuring out how to implement the math and testing out edge cases with synthetic data, interspersed between product planning meetings and debugging a performance issue with a new component of the HUD backend. I’m about to put out a Pull Request, when I take a look at Nagios and notice that one of the systems that powers the current HUD has just gone critical. I start to debug, and a second later I get a Slack message from someone on Chartcorps saying that customers are starting to notice that the HUD is down. I see that it is a simple fix this time; I just have to restart one of services that powers the HUD.

Just in time, too, because I have to head uptown with members of our sales team to talk to one of our strategic clients about our new headline testing product. On my way out the door, one of our designers pulls me aside to look at the current designs for displaying the results of headline tests in the new HUD: “Does this viz accurately represent the data?” We talk for ten minutes, weighing pros and cons and looking at design alternatives. We talk about color schemes. We talk a bit about user interaction.

The meeting uptown goes wonderfully; I give a high-level overview of multi-arm bandit headline testing, answer some technical questions about the product, and get great feedback about the product to take back to the team. When I get back in the office, I see a message from Lauryn Bennett, our Head of Brand, asking if any of us on the Data Science team have time to answer a request from a journalist about a news event that just happened. This particular request doesn’t require an in-depth statistical analysis, so I write a quick script to pull the numbers. I spend a bit of time looking at the results and then write up a few paragraphs describing what I’ve found. I then head into a meeting with fellow engineers, designers, and product owners to plan our next Sprint.

This is my typical day.


According to the Harvard Business Review, data science is the sexiest job of the 21st century. If you have data, you need a data scientist to get value from it; data scientists are the only ones who can wrangle #BigData into submission. Apparently, data science will save us all.

I’ve read many pieces over the past year trying to describe what data science actually is. There’s usually some talk about math and programming, machine learning, and A/B testing. Essentially these pieces boil down to one observation: data scientists do something with data. #DeepLearning anyone? I’ve followed arguments on Twitter and blogs about who should and should not be considered a data scientist. Is Data Science even a new discipline? How does it differ from Statistics? Programming? Or is it this…


Ok, then, what the hell does a data scientist actually do?

Now this is a question I can answer. And since I haven’t read many concise descriptions of what data scientists do day-to-day, I figured that I’d throw my hat into the ring and talk about the kind of data science we do here at Chartbeat.
“WARNING: it may be a bit different than what you might have heard that data scientists typically do for a living.”

OK, so, what exactly do you do?

Our team here at Chartbeat are what I like to call Product-Centered Data Scientists — meaning the majority of things we do on a daily basis are in direct support of our products. Because we are a data company, our role is pretty central to the organization. Of course, we do math. We build data pipelines and write production code. We do all kinds of analyses. But we also work regularly with sales and marketing. We go on customer visits and help out with sales calls. We even participate in user research with our designers, UX, and product owners.


As a tech company, we build software products. Plain and simple. As a data company, every one of those products has a data science need. Because of this, our team is embedded within the engineering team, and most of us take on heavy backend or front-end roles in putting code into production. We don’t just hand prototypes over to engineering for them to implement. We do the implementation. We tune our Redshift clusters, find API performance bottlenecks, choose the proper data structures. We are also part of the backend on-call rotation. If Chartbeat were to break at 2AM, we’d help fix it.

For example, just consider our Engaged Headline Testing tool. Andy Chen and Chris Breaux have been instrumental in designing, building, and maintaining the systems that power headline testing. Andy worked out the initial math for adding Engaged Time into the multi-arm bandit framework and was one of two people who built the initial backend. Chris Breaux has since taken over the Data Science role on the team and continues to push the math, and the product, to new places. The new features that will be released soon in that product are — in no uncertain terms — data science features.

In fact, all of us play central roles to each of the products with which we are associated. Josh Schwartz and Justin Mazur have built an enormous portion of our Ads Suite, Kris Harbold and Josh have built all of Report Builder, and Kris holds the distinction of being our only team member to have both front-end and backend code in production. Justin and I have worked on our Video Dashboard, and I’ve built a lot of the HUD. Each of us has contributed countless lines of code to all sorts of systems across Chartbeat.

“I don’t think it is an exaggeration for me to say that there is not a part of Chartbeat code that a data scientist has not touched.”


Okay, so we do math. This just comes with the territory. Sometimes sophisticated math, sometimes rote math. This math is either in direct support of a product or is part of an analysis we’re working on. We don’t do math every day, but when math is needed, we are there to answer the call.

Research + Analysis

Analysis is typically thought of as an essential skill of a data scientist, and we definitely do our fair share. These analyses range from customer specific reports to industry-wide analyses to analyses that inform a specific product build. Take, for example, the analysis Chris Breaux did on Dark Social traffic, or the countless studies Josh Schwartz, our Chief Data Scientist, has published on our blog and elsewhere. Or take, for instance, the research that Justin and Chris recently did towards updating our Engaged Time measurement methodology, the work Kris and I published on user journeys through websites, or the work Jeiran Jahani is doing to break new ground in topic detection. If there is a question that we can use our data to answer, we’ve likely been tasked with answering it. Sometimes our analyses take a few minutes; sometimes they take a few weeks. Sometimes we have to dig deep into our bag of tricks and pull out sophisticated statistical tools. Sometimes we write simple SQL queries to calculate averages.

User Interviews + Ethnographic Research

With our product designers and product managers, some of us on the data science team sit in on user interviews and do ethnographic research. This is not something that I’ve seen as common to data scientists at other organizations, but I think it is an incredibly important activity for a product data scientist to participate in.

I know a lot of data scientists and engineers who roll their eyes at this kind of stuff, but understanding user goals helps in the design of a data pipeline, the choice of an algorithm, or the decision for which metric is best for a given application. It makes you empathetic to your user base, which is never a useless endeavor. What product-centered data scientists do is try to keep in our heads at all times exactly what has to happen to create an amazing user experience.

“From the ugly, messy data at the start of the pipeline, to the user’s interaction with the tool, the user is interacting with data, and that has to be in our purview.”

These interviews also give context for where you can be lax with assumptions, because you often have to make trade-offs when you try to implement your fancy models. Sometimes all that great math adds one second to the response time of an API, and when you have traffic like ours, sometimes you can’t afford one second. Knowing the fidelity that your users expect or require helps solve this problem.

When we were redesigning the HUD, I sat in a variety of newsrooms with one of our designers and watched editors work. We simply watched them use our product in their day-to-day flow, and asked questions now and again about what they were doing. I also sat in a few user interviews during this time and have since sat in on countless others. Those experiences have influenced the engineering and data design of the HUD, as well as several other products I’ve helped build. And now, I can’t imagine being part of a product build without having done at least some user research.

Ideation + Future Products

Product-centered data science is not all about maintaining current systems or developing feature increments. There is also a large amount of long-term vision thinking. What will our products look like next year? In the the next five years? Often, our team will prototype systems to test feasibility of an idea or a product direction. We comb through the latest data science papers and computer science literature to see if any of the latest findings can be applied to future (or current) products. Once every six weeks, we set aside a week for our entire team to do data specific projects that aren’t directly connected to current projects. We’ve built some cool stuff — a system that scrapes and searches content across our network, a tool that discovers popular stories in the news, a deep recurrent neural net to predict traffic, a Slackbot recommendation engine — you name it.

Sales + Marketing

Not only do we help design and build the products, but we do what we can to help sell them, too. We’ll often pull customer-specific numbers, industry benchmarks, or even do full-on reports for the sales team to use on on sales calls. Sometimes we’ll even sit in on those calls and other client visits. We write blog posts and our Data Science Quarterly, which help out the marketing team grow our customer base. We write product white papers. We give interviews to reporters. Basically, we are tasked with speaking to whomever on behalf of Chartbeat Data.

Product-Centered Data Science

This is product-centered data science — at least here at Chartbeat. Personally, I think every product team should have a data scientist on it. Data science is about storytelling, and so are product design, sales, and marketing. There are so many intersections in thinking that it just seems natural for us to be involved in all these parts of the business. I might be in the minority, but for me, data science really has nothing to do with #BigData. It has nothing to do with machine learning. It might not have anything to do with statistics. It is about asking questions, developing user empathy, creating an experience, and telling a story. Our medium is data, our medium is code, but the outcome are fantastic product experiences.

We’re always looking for great storytellers: whether data scientists, account managers, or backend engineers. Come join us.

Happy National Play-Doh Day!

September 16th, 2015 by Jared

Happy Wednesday, people of the world!

Today, Chartbeat celebrated National Play-doh Day. Bright and early this morning, we all found a single canister of Play-doh placed on our desk. A competition was held for the best sculpture, and it became obvious very quickly that some pretty gifted sculptors were within our midst. Here are some of the finished products:

The Sculptures


Some team members were more … abstract:
Like Play-doh? Like Data? Come join the team. We’re hiring.

What’s On Our Desks

August 18th, 2015 by Ashley

Screen Shot 2015-08-18 at 2.12.10 PMThis week on the Design Blog: Find out what book(s) each member of the design team keeps on their desks for reference, fun, and inspiration

Envisioning Information  by Edward Tufte

The Design of Everyday Things by Don Norman

HTML & CSS: design and build websites by Jon Duckett

Thinking with Type by Ellen Lupton
Infographics Design by BNN

Universal Principals of Design by William Lidwell, Kritina Holden, & Jill Butler
Type on Screen by Ellen Lupton
The Art of Tim Burton by Leah Gallo

“I am constantly inspired by the artistic concept of Tim Burton. Having the work of those whom you admire near-to-hand is so important to me.” — Avi, Senior Marketing Designer.

The Wall Street Journal Guide To Information Graphics by Dona M. Wong
Just Enough Research by Erika Hall
The Shape of Design by Frank Chimero

And this week’s bookworm winner: Collin!
Universal Methods of Design by Bella Martin and Bruce Hanington
Strange Plants by Zioxla
Typographie by Emil Ruder
Vampires in the Lemon Grove by Karen Russell

Editor’s Pick: “Vampires in the Lemon Grove shines. It makes you want to jump up and sing, to spend the rest of your life trying to be Karen Russell.” — Jared, Marketing Associate.

The Elements of Typographic Style by Robert Bringhusrt
Designing Information by Joel Katz
Information Dashboard Design: Displaying Data for At-a-Glance Monitoring by Stephen Few
Resonate: Present visual stories that transform audiences by Nancy Duarte
2013 Feltron Annual Report
Fox 8 by George Saunders

P.S. Want a desk of your own? We’re hiring: Chartcorps, Product Design, and Marketing Design. Check out the openings here.



chartbeat annual report


2013 was a pretty fantastic year for online publishing. You hit huge concurrent numbers and created some amazing content that kept readers coming back for more.

As a thank you for including Chartbeat in your year, we wanted to pull together some aggregated data highlights for you to celebrate because, let’s face it, we couldn’t have done it without you.

Check out the 2013 Chartbeat Annual Report here.

Hannah Keiler was our Fall 2013 Data Science intern here at Chartbeat, working with Chief Data Scientist Josh Schwartz. Hannah is a senior at Columbia University, where she studies Statistics with a concentration in Computer Science. This blog post details one of several projects she tackled during her internship at Chartbeat.

At Chartbeat, we sometimes want to compare metrics across similar sites. There are several different ways to group sites. For example, you can begin by thinking about grouping sites by size – comparing metrics like number of readers or articles published each day. We were also interested in grouping together sites that write about similar content. Grouping sites by content manually for thousands of domains is incredibly tedious, so we wanted to devise a metric that would allow us to group similar sites automatically.

One way to define sites as having similar content is if they write on similar subjects at around the same time. If sites write about the same subjects, they are probably using the same key words, like “Obama” or “Syria.” We knew that the words that best summarize the content of an article are likely the words appearing in its headline. Keeping these ideas in mind, we developed our metric.

Computing Similarity

We start by comparing sites two at a time. Let’s call the sites A and B. We look at the words used in the headlines in A and B day by day.

For each day, we record the words used in both A and B and compute a weighted sum of their counts. That means that we divide the number of times a certain word occurs in both A and B in one day by a number indicating how often that word occurs in headlines in general. Weighting the word counts helps us to pick out two sites that write about niche topics by giving more weight to rarer words. For each day, we then sum up all of these values and then we sum up all of the values for all of the days. Let’s call this final sum “Value 1.”

We also record all of the words used in headlines by either A or B for each day.  Then for each day we compute a weighted sum of these word counts and then add up all the weighted sums from each of the days into one value. Let’s call this “Value 2.”

 Then we divide Value 1 by Value 2. We now have a ratio of sorts of the number of words A and B share versus the number of words they use in total.

How does this look?

We first computed the similarity metric for sites whose content we thought was geared towards sports, music, or celebrity/entertainment news. To visualize the similarity metric, we plotted the sites as nodes in connected graph.



 FYI: These graphs are anonymized because we don’t share individual client data

The distance between the sites represents their similarity. Closer sites have a stronger similarity metric. On this graph, the sports sites are dark blue, the celebrities sites are red, and the music sites are teal. As you can see, sites with similar content group together! The fact that the celebrity sites are in the middle implies that they share some content with music and sports sites, which makes sense. The outlier posts fewer articles daily than the other celebrity news sites, so there was less overlap in term usage and, accordingly, the similarity metric was lower.

 We also tried out our metric with British and Australian news. We get the graph below.



Here, the UK sites in red group together and the Australian sites in teal group together. The outlier writes more niche news stories than general Australian news, so it had less overlap with the other Australian and British news sites.

Wrapping Up

These initial results show that sites that post articles with the same topics in the headlines at around the same time tend to be similar types of sites. Moving ahead, this could be a great way to group sites into different categories based on their content.