Archive for the ‘Life at Chartbeat’ Category

It’s no secret that here at Chartbeat, we’re a little dog-obsessed.

So last week, we teamed up with our friends over at Social Tees Animal Rescue and had them bring in some puppies for us to play with for a few hours. As you can see in the pictures below, we had too much fun… And the best part is, they’re all available for fostering or adoption!


About Social Tees

Social Tees Animal Rescue is a not-for-profit, strictly no-kill 501c3 organization in the East Village of NYC that takes abandoned animals from kill shelters and provides them with safe haven and veterinary care before placing them in proper forever homes. They rescue, rehabilitate, and place over 3,000 dogs, cats, birds, and exotics per year.

Social Tees Animal Rescue relies heavily on benevolent donations from the public, 100 percent of which are used directly to save animals’ lives.


20160614_142142Photo Credit: Burton DeWildePhoto Credit: Burton DeWilde Photo Credit: Burton DeWilde


Check Social Tees out on Instagram and Facebook, and all information about donations can be found on the Social Tees website.

At Chartbeat, we love ourselves some tech nerdery. We like computer things, internet things, and all the latest gadget things. But we also know that not everyone’s like that, so some of the hacks that we use to make our Chartbeat lives easier might not be immediately obvious to everyone.

That’s why the Chartcorps wanted to spread the insider knowledge on a few ways to make all of your Chartbeat tools more accessible.

Pull up your Dashboard with quick search

Ever wanted to quickly check your Dashboard, but forgot where you bookmarked it? Or you manage a couple different Dashboards under your account and you forgot how to switch between them? Just a fan of typing in general?

By adding a simple custom search engine to your browser you can give yourself a hotkey built into your browser’s URL bar itself. And it’s really easy to setup—here are the instructions for Chrome:

  1. Go to your Chrome Settings Page by using the dropdown menu in the upper right-hand corner, or just opening a new tab and pressing “command + comma” or “ctrl + comma.”
  2. Midway down the page there’s a section called ‘Search’ and a ‘Manage Search Engines’ button—select that.
  3. Settings

  4. In the modal that comes up, scroll down to where it says ‘Other Search Engines’ and at the bottom where you see three blank boxes, put “Chartbeat Publishing” in the first, “cbp” in the second, and in the third this URL:

    Use the Video Dashboard? Try this URL:

    Now whenever you type ‘cbp’ into your Chrome URL bar and hit either tab or space, you should see it transform into a little blue box. Then type the domain of the Dashboard you want (with the ‘.com’ at the end) and hit enter. Boom. You automatically navigate directly to your Dashboard.
    Now of course this only works when you’re logged in and if you have permission to access that Dashboard, but it’s still a super easy way to pull up exactly what you need in only a few keystrokes.

    Filter and Favorite

    The Chartbeat Dashboard is designed to be customized for your audience development goals, whether you’re focused on social media, a particular section, or your mobile readers.

    If you’re one of these people who really focuses on a specific segment of your audience, why not just bookmark the filter you want to go directly to it?

    Let’s take a social media editor, for instance. We all know that if you only want to see visitors who arrived from a social media source, you just select the “Social” traffic source in the upper right hand corner.


    But since when you do that, it actually changes the URL (see how it adds that ‘#referrer-type=social’ bit at the end?), you can just bookmark that page now and jump directly to this view.

    Try it out with any Dashboard filter—mobile, new visitors, Twitter visitors, ranking top stories by engaged time—or even a combination of filters.

    Looking for more tech tips? Reach out to the Chartcorps at or @Chartcorps on Twitter.

Today, I’m thrilled to finally announce that Betsy Morgan, former CEO of The Huffington Post and most recently CEO of TheBlaze, is joining the Chartbeat Board of Directors.

I say finally because it feels as though Betsy has been advising me and Chartbeat since day one. She’s been an incredible partner, sharing her experience and expertise generously with the Chartbeat team.

In fact, years ago, Betsy contributed a guest post for Chartbeat in which she said something that I think about often:

“Data needs to be used as a conversation starter – a way of getting people to think about things in a different way.”

I think of Betsy as the conversation starter for the media industry. She gets us to think about content, data, and business in a different way. There are few, if any, people in the same league as Betsy in our industry. Her experience is unparalleled.

Betsy led TheBlaze for the last five years, where she grew the site’s audience from five to thirty million uniques a month, while simultaneously building TheBlaze TV’s IP-delivered subscription model, which has since become the industry standard for many celebrity-based content businesses. Prior to leading The Huffington Post, Betsy spent ten years at CBS where she was Senior Vice President for CBS Interactive and the General Manager of

We share Betsy with LearnVest, Goldieblox, Zemanta, TheSkimm, and Sidewire, among others where she’s an advisor and with Colby College and Mentoring USA where she’s on the Board of Trustees and the Advisory Board, respectively. As well, she serves on the Media Council of Springboard.

Unsurprisingly, Betsy has been repeatedly listed on Business Insider’s “Silicon Alley 100,” as well as named one of 27 Business Insider’s “Game Changers” of 2011. In 2012, she was named one of “20 Women to Watch” by Columbia’s Journalism Review.

To say we’ll learn a lot from her is an understatement.

Thanks for coming aboard, Betsy.

If you had to describe five important events that were happening in the world right now, what would they be? How would you even go about answering that question?

To start, you might visit the homepage of your favorite news site, aggregator, or publisher. But just one site won’t have everything you’re looking for — maybe you want different takes on today’s news. What you might do is collate articles across several sites, see which news events multiple publishers are reporting on, and look at different perspectives on each story.

For our Hackweek project, backend engineer Anastasis Germanidis and I developed a process to identify these trending, important, global news events automatically and in real-time, using publicly available data. With a few machine learning algorithms, we can group articles across different sites by news event and output a list of important news events being reported right now, each represented by a set of articles providing different angles on the story.

I’ll first show our results, and then talk about the data science that makes this work. Below, I’ve run our data science pipeline on the home pages of major U.S. publishers, including the New York Times, the Washington Post, and Wall Street Journal, scraping data from the afternoon of October 13. To be clear, this pipeline does not use any data from Chartbeat’s analytics products – everything we use comes from a web scraper, which sees what any reader on the web would see.

Our project captures the important events of the day through algorithms and provide multiple articles for each new story.

Results: October 13, 2015

News Event 1: Violence in Israel

News Event 2: Kansas City Fire

News Event 3: Democratic Debate

So How Does it Work?

First, we need a dataset of articles to work with. We start by using PhantomJS, an open-source web scraper, to scrape the homepages of several major U.S. publishers including the New York Times, Washington Post, and Wall Street Journal. We want articles that homepage editors think are important to today’s news, so for each page, we look at all article links above the fold on a desktop screen and pick the top ten articles by link size.

We feed our article links to Python-Goose, a Python library which extracts the content of an article given its URL. Now we have the title, description, and content of ten articles on each homepage we started with.

We want to organize our dataset of scraped articles into news events. We start by preprocessing our article text with two steps: 1) named entity extraction and 2) tf-idf vectorization. Let me explain:

Named entity extraction

This involves identifying words or phrases that correspond to names of things. We use the MITIE python library, which identifies the names of people, organizations, and locations and classifies each entity it finds into one of these three categories. For our purposes, we’re less concerned with the classification of each named entity than the identification of these words and phrases. We extract all instances of named entities in each article to use for the next step of our pipeline.

Because news events almost always can be uniquely identified by the names of people, organizations, and locations involved, named entity extraction is an effective way of filtering out relatively unimportant terms while retaining important information — think of it as an extension of stop-word removal.

tf-idf Vectorizer

Scikit-learn’s tf-idf vectorizer transforms our list of named entities into a numerical vector for each article, which allows us to cluster articles with standard clustering algorithms. tf-idf stands for term frequency-inverse document frequency. In this case, term frequency is the number of times a named entity appears in an article divided by the number of total entities in the article. Document frequency is the fraction of articles in our dataset that contain a particular named entity. For a given entity and article, term frequency-inverse document frequency is simply the term frequency divided by the document frequency.

Roughly speaking, tf-idf gives a higher weight to entities that appear frequently in the article but less frequently in other articles.

Each dimension of an article’s tf-idf vector represents the tf-idf statistic for a particular word in our vocabulary. In this pipeline, our vocabulary contains all entities that have appeared at least once in our article dataset.

We cluster our tf-idf article vectors using an algorithm called spectral clustering, again using scikit-learn. Spectral clustering consists of three steps: first, we use the similarity of tf-idf vectors between pairs of articles to construct a similarity matrix of our data. We perform dimensionality reduction on this matrix using an eigenvalue decomposition, and finally use the k-means algorithm on this low-dimensional matrix to obtain our article clusters. We’ve found that for a dataset with 60 articles from six publishers, clustering into seven or eight groups works well.

Why didn’t we use a probabilistic topic model such as Latent Dirichlet Allocation? We found that topic models such as LDA give you clusters that roughly correspond to sections, such as technology, science, and politics, and not individual news events. This is perhaps because these algorithms allow for an article to belong to multiple topics instead of forcing a hard classification. This doesn’t make sense if topics are to correspond to news events – we know that an article will rarely report on more than one news story.

Here’s a diagram of our full pipeline.

What’s Next?

Recently, Twitter released a product called Moments, which organizes tweets into events using a team of human curators. We want to use our automated process to do the same with news articles, and we’re working towards a web application that displays our news events in real-time.

By using algorithms to evaluate the importance of news stories, we give you an easy way to figure out what’s happening in the world right now — without having to organize articles yourself or even wait for human curators.

You’ve heard this story before—boy graduates college, boy moves to New York, boy starts working at a tech startup full of cool nerds, boy finds a career path.

Read the rest of this entry »