Archive for July, 2013

At Chartbeat we’re currently pushing to building out many new products. All of our work is focused on data. My part in that is ensuring our infrastructure is prepared to handle all of the data that we need to push through it. At peak on an average day, our servers handle about 130K requests per second. Even a new product that is, essentially, “just looking at the data in a different way” can require a lot of engineering.

Our real-time focus has let us get away with keeping only aggregated data for historical look-back. As products evolve, this will need to grow to a larger and larger fraction of the full signal that we capture, which in turn will need a storage engine to handle it. Redshift is one of the tools we’re looking at. A couple of recent posts have helped us out quite a bit while forging into new territory.

As the ability to capture and store data becomes easier, simple engineering tasks around that data become harder. Take counting. Basic algorithms to count the number of distinct records in a dataset — that is, assuming the dataset has repeats — have been around for decades, but the simple ones assume that the subset of distinct items will fit in memory for one machine. That assumption is routinely broken by today’s standards. Recently I’ve been brushing up on the ideas around probabilistic counting.


Apps We Love

July 16th, 2013 by Doug

At Chartbeat, we all have various apps and browser extensions we love. Whether we build products, crunch data, or work with our clients, we latch onto anything that makes things just that little bit faster, easier, or just generally better.

This is by no means an exhaustive list of our favorites – if you have some apps you think we should check out, post them in the comments!

TL; DR: My personal favorites

– Boomerang
– GitHub
– Venmo
– Songza

Chartbeat favs: e-mail, chat, calendar

– Boomerang (Gmail – Chrome, Firefox) – My favorite thing right now, a Gmail extension to wrangle your inbox. Schedule e-mails to be sent on a certain day if you haven’t gotten a reply, or schedule a thread to pop back into your inbox if you haven’t gotten a response.

– Rapportive (Gmail – Chrome, Firefox, Safari) – Another Gmail extension, but this one tells you who you’re talking to by pulling their info directly from LinkedIn and other networking sites.

– HipChat (in-browser, Mac, Windows, Linux) – Our chat app of choice. In our setup, we have rooms dedicated to general BS, our different teams, and formerly a bizarre one for posting and rating photos of birds (RIP Chartbeat Ornithology). It also allows API integration, and the creation of scriptable bots with Hubot.

– HootSuite and Tweetdeck (in-browser, Mac, Windows)- Great, simple Twitter platforms

– Sunrise (iOS) – Just a great Google calendar app for your iPhone

Chartbeat favs: development and troubleshooting

– Asana (in-browser) – How our teams, especially product and development, manage their projects

– Sublime Text 2 (Mac, Windows, Linux) – Simple, pretty text editor

– GitHub – “Online project hosting using Git. Includes source-code browser, in-line editing, wikis, and ticketing. Free for public open-source code.”

– VirtualBox – Run virtual machines to test things out on other operating systems

– Edit this cookie (Chrome) – Deleting, adding, and editing cookies

– JSONView (Chrome, Firefox) – Prettifies JSON returns. Super handy if you’re an API user.

– User-Agent Switcher (Chrome) – Quickly view a page as a user on another browser. Especially handy if you need to check a mobile site (as long as the switch is based on user agent).

– Window Resizer (Chrome)- A Chrome extension to easily resize your window to text different resolutions

Chartbeat favs: overall productivity apps

– Evernote (Mac, Windows, mobile, in-browser) – Save all your notes and sync them to everything

– Monosnap, Skitch, and Droplr (Mac, Windows, mobile) – Easy ways to take and annotate screenshots

– Alfred (Mac) – Search anything on your Mac or enter quick commands

– Divvy (Mac, Windows) and Moom (Mac) – Better window management and repositioning

– IFTTT – If this then that. Want to get a text based on the weather? Automatically save Facebook photos you’re tagged in to Google Drive? Tons of trigger and action combinations to choose from.

Chartbeat favs: General life

– Venmo (mobile) – App for easily exchanging money to friends. Makes grabbing food or drinks super easy. Use it. Make your friends use it.

– Pocket and Instapaper (in-browser, mobile) – No time to read an article? Save it for later.

– GroupMe (mobile) – Group text messaging. Download the actual app, as using it purely through SMS gets confusing.

– Poncho (browser, mobile) – A daily weather text or email. Critical during this summer of surprise rainstorms.

– Spotify, rdio, Songza – Music, music, and more music. Check out the Chartbeat Spotify app.

– Steam (Mac, Windows, Linux) – I play games. I couldn’t not mention Steam.


Tomorrow, I’ll be at the Association of Alternative Newsmedia conference in Miami. During our panel on Better Content Through Analytics, we’ll be focusing on how editors can get the most of out data to make content decisions, and how to get your newsroom to care about data. I’ll also go into developing a deeper understanding of your visitors, how to match the right content to the right people, and turning your visitors into a loyal audience.

Come by the panel if you’re going to be in Miami, or hit me up if you’d like to chat.
What’s your story Dion?

My background is in software engineering, and in the past I’ve specifically done things in the pre-Android/iOS mobile space, worked on both the consulting and sales engineering sides of the fence and been a Solution Architect helping plan large technology deployments. Now I’m VP of Engineering at CreativeWorx where my days are spent architecting applications that put data and analytics about how our customers’ work in their hands.  I’m also a volunteer for the Coalition for Queens where we are working hard to build a tech ecosystem in Queens that supports the growth of tech in my home borough.

Where did you first learn about Chartbeat?

I was at a MongoDB Meetup maybe two years ago when I first saw the work Chartbeat was doing. I had limited knowledge of MongoDB, so I started attending Meetups to learn more about how companies were using it. When I was first introduced to MongoDB in its earlier days, there was a lot of skepticism – a lot of my colleagues weren’t convinced it would survive.

Chartbeat stood out at this Meetup because you guys were pushing for real-time decision making based on large volumes of data. Seeing how you were helping companies conduct business around these large data sets, getting real-time feedback from a large audience, making in-the-moment data-based decisions was all very interesting to me.

How has the MongoDB work Chartbeat is doing influenced your own work?

More so than other companies, Chartbeat showed me that you can achieve goals in MongoDB that would normally take a lot of infrastructure and complex technical overhead to implement. What Chartbeat was doing with MongoDB destroyed a lot of the assumptions and skepticism I’d inherited from people who weren’t familiar with using MongoDB. After that Meetup, I just dove right in and started learning it.

Chartbeat’s MongoDB work led me to start building applications with it and test the tech’s true potential. In the end I really did enjoy switching from the world of traditional databases to a set of data where I can do anything I want.

With MongoDB, I have to accept responsibility for maintaining on the application level some of the rules that traditional databases have baked in but in return I have the control to use and alter that data to build more scalable and higher-performing applications.  It creates a world where I feel you are truly constructing your data to best serve your application and not trying to have your application be driven by the database’s rules.

What advice would you give to people considering working with MongoDB?

First piece of advice I would give is download it, read the SQL to Mongo Query table and give it a try. Building a simple application that just queried and inserted data was enough for me to really understand how I could use it. And when I say use it I mean use it as its intended building a document instead of a traditional flat row. There are many videos online about MongoDB Schema Design which will help you understand the advantages and new ways of thinking when building applications with the freedom of a document database.

The second piece of advice is with all my praise MongoDB isn’t necessarily right for every application. While I love how easy it is to scale the tech and how fast it is, if you are building a system that relies heavily on transactions you will be challenged. By that I mean systems that require an all or nothing approach to updating several pieces of data will find that there currently isn’t a solution for grouping together a set of updates and only doing a commit if all of them can be done successfully. As you can imagine there are creative ways to get around this depending on your situation by using nested objects so the changes are in one document or building your own rollback into your application but these choices may not be realistic for your application.

Third is get involved in the community. Tech is in a pretty amazing place now as compared to where it was when I first started my career. Companies are so open and helpful today in the ways they share their success and failure. I recently attended MongoNYC and was shocked that I was sitting in a room listening to a presentation from Goldman Sachs about not only how they are using MongoDB in house but how they built their architecture to easily spin up dev and production environments for developers to build applications.

If you aren’t involved in the rising community of tech you are really missing out.

Chartbeat is just one of the many companies that are using MongoDB to solve the big data problem and I’m glad you and others are sharing your experiences with the tech community.

Reposted from Chartbeat Engineer Harry Wolff’s personal blog,

An undervalued skill of the common developer lies in their ability to navigate and explore a new code base. Although not the most glamorous or fun task, it is the one that almost always consumes most of a developer’s working hours.

Despite the percentage of time this work consumes, it is a topic that is seldom discussed on the internet. Reason for that is simple: it’s not fun. It’s not fun understanding someone else’s code, pondering why they arranged code the way they did, why names are named the way they are, and what performance enhancement drugs they were on.

At every job I’ve started I’ve had to enter the jungle that is “someone else’s code”. Along the way I’ve come up with a few strategies to mitigate the pain. Some are obvious while others are even more obvious.

And please: if you know of any other strategies to reduce the pain of learning a new code base I would love to hear it. I’m sure everyone reading this post will appreciate it as well.

How Is The Code Structured?

First thing I always ask when encountering a new codebase is, “How is the code structured?” To elaborate: from the time a client makes a request to when the client’s browser renders the web page: what is going on?

Where is the entry point? How is it funneled through the server side code? Where is the client side code initiated?

Where are the pertinent files in the directory structure? Are there any weird gotchyas with callbacks executing critical code in a non-obvious way that without the callback nothing would ever render?

These are the questions I ask both myself and my new co-workers. I try to get a grip on the code, understand the general lay of the land if you will.

This is the part of the process where I want a high-level view of “what is going on?”. The details of the process are distractions at this point: all I need is a working knowledge of all moving parts.

This process takes anywhere from a day to a week in my experience. Usually I take around two days to get a general idea of how the code is structured and what it’s comprised of. At this point I have a naive understanding of the entire system, enough that I feel confident that I’ll be able to make changes, which is the second thing I usually do.

How can I change code and see the results of my work?

In the great words of Yoda, “Do or do not, there is no try.”

There is no better way to learn something than by trying it out and actually doing what you are learning. This is true for everything (everything!) including learning code.

When I am given a small starting task at a new job I take an oblong approach: rather than attack it head on I reduce the task to the smallest unit of provable success.

For example if I am tasked to add myself to the about page I’d want to do two things:

  1. Find the file that I think is the about page.
  2. Add the most obvious edit to make sure that a) I’m editing the right file and b) that there is nothing else in-between editing the file and seeing that change reflected. (My personal favorite edit is along the lines of <h1>Pandas Rock</h1> appended to the <body> of the page.)

Rather than assume I am doing everything correct, I actually prove it to myself and avoid working on something that could very well be wrong.

The about page that I’m editing may be an outdated one that is no longer used, however still exists in the repository and as such is often confused with the one used in production. By simply testing to make sure I’m in the right place, and doing the right thing, I save myself wasted dev cycles and frustration.

This process of orienting myself before making changes further cements my knowledge and understanding of the code base, and prevents me from getting lost. By forcing myself to always breach the surface and take a gasp of air (i.e. seeing my changes reflected) I’m able to take a dive into the code base to find what I need.

As I’m able to hold my breath longer (i.e. my knowledge of the codebase increases) my need to come up for air and double check I’m in the right place decreases.

What were the design ideas?

After my memory of the code base begins to solidify and coalesce I begin to ask more introspective questions. I begin to wonder why things were done the way they were and what was the reasoning behind these decisions.

These questions allow me to gain a deeper and more intimate understanding of the code and my co-workers. Rather than assume an aspect of the code I find un-intuitive is the result of poor decision making I can ask to find out how it came into existence.

Perhaps as the project was developed requirements changed, causing a rush of modifications that resulted in the code that I had just assumed was the result of poor choice. I may have even taken my assumptions as far to assume the developer that came before me hated his own kind and wanted all those who came after him to suffer! What a thought!

Through understanding of the why I can at least appreciate and reconcile my differences with the code rather than begrudge it. Not only that I can find my co-workers dislikes and work together to fix our problems.

Another benefit of understanding the ideas behind the code is its ability to strengthen and deepen my knowledge. While I may have wondered why some component had a certain name that was not representative of its usage, I may learn it once did what it is named to do and why it was changed.

Getting out of the jungle alive

Every code base is unique. Sure, there are overlaps that will make learning a new code base easier: if you use a framework (Ruby on Rails) or follow a common architecture (MVC). However the spirit of the code lies in the details, and those are near impossible to clone.

So you must enter the jungle that is a new code base. With or without trepidation, you must plunge in head first, without knowing what you’ll find.

And after some time you’ll exit the jungle. You’ll come out wiping your hands free of dirt, turn around, and see a playground with a brand new swing set. It’s the same code base, but now you know what’s inside, and you know the fun you can have. Sure you may add some renovations (always need a vomit wheel), but that jungle will have been tamed. And in taming the jungle, you’ll now be free to build your own.

If you want to follow my adventures in code, visit my personal blog
Find me on twitter @hswolff.

Feel free to reach out to me in the comments section.