Hannah Keiler was our Fall 2013 Data Science intern here at Chartbeat, working with Chief Data Scientist Josh Schwartz. Hannah is a senior at Columbia University, where she studies Statistics with a concentration in Computer Science. This blog post details one of several projects she tackled during her internship at Chartbeat.
At Chartbeat, we sometimes want to compare metrics across similar sites. There are several different ways to group sites. For example, you can begin by thinking about grouping sites by size – comparing metrics like number of readers or articles published each day. We were also interested in grouping together sites that write about similar content. Grouping sites by content manually for thousands of domains is incredibly tedious, so we wanted to devise a metric that would allow us to group similar sites automatically.
One way to define sites as having similar content is if they write on similar subjects at around the same time. If sites write about the same subjects, they are probably using the same key words, like “Obama” or “Syria.” We knew that the words that best summarize the content of an article are likely the words appearing in its headline. Keeping these ideas in mind, we developed our metric.
We start by comparing sites two at a time. Let’s call the sites A and B. We look at the words used in the headlines in A and B day by day.
For each day, we record the words used in both A and B and compute a weighted sum of their counts. That means that we divide the number of times a certain word occurs in both A and B in one day by a number indicating how often that word occurs in headlines in general. Weighting the word counts helps us to pick out two sites that write about niche topics by giving more weight to rarer words. For each day, we then sum up all of these values and then we sum up all of the values for all of the days. Let’s call this final sum “Value 1.”
We also record all of the words used in headlines by either A or B for each day. Then for each day we compute a weighted sum of these word counts and then add up all the weighted sums from each of the days into one value. Let’s call this “Value 2.”
Then we divide Value 1 by Value 2. We now have a ratio of sorts of the number of words A and B share versus the number of words they use in total.
How does this look?
We first computed the similarity metric for sites whose content we thought was geared towards sports, music, or celebrity/entertainment news. To visualize the similarity metric, we plotted the sites as nodes in connected graph.
FYI: These graphs are anonymized because we don’t share individual client dataThe distance between the sites represents their similarity. Closer sites have a stronger similarity metric. On this graph, the sports sites are dark blue, the celebrities sites are red, and the music sites are teal. As you can see, sites with similar content group together! The fact that the celebrity sites are in the middle implies that they share some content with music and sports sites, which makes sense. The outlier posts fewer articles daily than the other celebrity news sites, so there was less overlap in term usage and, accordingly, the similarity metric was lower.
We also tried out our metric with British and Australian news. We get the graph below.
Here, the UK sites in red group together and the Australian sites in teal group together. The outlier writes more niche news stories than general Australian news, so it had less overlap with the other Australian and British news sites.
These initial results show that sites that post articles with the same topics in the headlines at around the same time tend to be similar types of sites. Moving ahead, this could be a great way to group sites into different categories based on their content.
So you just graduated from college. Or you’re stuck in the library dreaming about money. Ok, you’re just wasting away at a company that you don’t give a damn about. It’s the perfect time to find work at a tech startup. Not a developer? Never written a line of code? No worries. I got your back.The idea of working at a startup is appealing, for sure. But if you don’t know where to start, things could get ugly. Here are three surefire ways to find and get hired by an awesome tech startup.
Step 1: Remove the Weeds
According to one estimate, about 906,241 tech startups existed in the US as of 2010.How the hell are you supposed to find a startup that is both building something exciting, and completely legitimate?You know yourself better than I do. I won’t tell you what’s exciting (real-time data anyone?) and what isn’t. But once you’ve identified that, here are a few ways to pick a winner (and make a good first impression on your interviewers).
Unsure of what you’re looking for? Search through portfolio companies of top Venture Capital firms to find startups that are well-funded and looking for someone with your distinct skillset.
Know exactly what you want? Use CrunchBase, a powerful startup database, to find a company as specific as: a BioTech startup within 5 miles of Manhattan, founded after May 2000, with a maximum of 200 employees, and $1 million in funding (and other specific attributes).
Willing to live in NYC and need to start ASAP? Check out Made in New York City to find all the currently hiring startups in NYC.
Step 2: Expand Your Circle
You have an idea of where you want to work, but you don’t have any connections within the startup realm. It’s time to go to tech meetups and find tech-literate people. You probably want to stop reading this post right about here – but hang on!
We’re all in agreement that forced-networking is the most agonizing experience a human being can endure. Don’t worry. I’m not telling you to spam an event with your business card.
Check out a meetup that genuinely interests you, and have one meaningful conversation.
Play pickup basketball with people interested in tech.
See what all the big names in the industry are talking about at a large tech and entrepreneurship event.
Get out there and get acclimated to your new universe. You’ll get a sense of whether or not you’re a good fit. And, you will reduce your anxiety about interviewing at a big, bad tech company.
Step 3: Share YOUR Story
Speaking of interviews, when you’re sitting across the table from your future boss, she only really wants to know one thing. Can you fill the hole that’s plaguing the company?Developers rely on a track record of building scalable systems or shipping web applications to prove their worth.
We (mere mortals) rely upon our ability to tell a succinct, compelling story. The chapters of your story are already there. It’s time for you to find the common thread, and sew that bad boy together. Your story, composed of your work and personal experiences, MUST align with the startup’s mission and requirements for the role.
Say, for example, you’re interviewing at a startup that wants to make life easier for small business owners, and needs someone with related experience.
When you’re asked to talk about your interest in the company, share your story of working at your mother’s clothing store every summer (if it’s true). Dig into the specific challenges she faced as a small business owner. Share the lessons that you can apply if you’re hired by this startup. Tie in how the startup’s product would have improved visibility for your mom’s store and increased sales.
It will break the ice, show your deep understanding of the product/mission, and force the interviewer to remember you. Who knows. It may even help you land the job.
So that’s all I’ve got for now. Feel free to sound off in the comment section and I’ll do all I can to help you land that job you’re working towards.
I just wrote a piece for AlleyWatch about the difficulty I’ve experienced in hiring female engineers at Chartbeat. We’re hoping to source some great ideas from people like you – so please share ideas and opinions in the Comments below. You can read the whole piece here, but enjoy this snippet below.Like a number of growing startups in New York City, the Chartbeat engineering roster is impressive – and getting larger by the day. Since our second round of funding in April 2012, Chartbeat has more than doubled in size, hiring 39 new employees, including 16 engineers. Hiring developers in general is no easy task, as FastCompany explained in Why Your Startup Can’t Find Developers. So we’re incredibly proud of our growth, but there is one huge, glaring gap: we don’t have a single female engineer – and we never have in our four years of existence. And that simply must, no questions asked, change.As Head of Talent at Chartbeat, this responsibility rests with me, and I will tell you that since I joined about a year ago, we’ve tried everything, from traditional job postings to leveraging our seemingly cool company brand at every opportunity, but we’ve continued to fail at hiring female technical talent.The bad and good news is: we are not alone in this problem.Hiring female engineers isn’t a novel issue. The New York City Economic Development Corp says that only 9.8% of the female workforce is employed in a tech-related industry in the city, even though 39% of women with a bachelor’s degree majored in the science, technology, engineering and math fields. So why aren’t they joining us? There’s no simple or one answer, so I won’t even try to break it all down. It’s pretty obvious that the stories hitting the front page of ValleyWag every day about the latest Pax Dickinson or the latest rage-inducing brogrammer culture example aren’t helping to solve the problem.. But the division starts long before the workplace. According to a 2010 study conducted by Women in Computer Science (WiCS) at Stanford, only 15% of all computer science undergrads were female. A gap in education this severe no doubt directly influences the genetic makeup of the tech scene.But we know all this stuff. We’ve heard about it ad nauseam. So why are we bringing it up? To be honest, we need your help.While “changing the ratio” is discussed constantly by smart folks like Rachel Sklar who are leading the charge, both on and off social media, on conference panels, in blog posts, and in the tech pressbecause it’s such a far-reaching issue (much farther than just the male-to-female ratio). And the tactical challenge of hiring female talent isn’t addressed all that often.Like many problems in the tech industry, the issue of available female engineers might best be addressed through open sourcing, so that’s what we’re doing. We’re doing this publicly and transparently to address this issue head-on in a personal way, rather than as a theoretical discussion. I’m sharing what we know right now, what we need to learn, and how we plan to get the knowledge we need in order to create actionable plans going forward, so you can tell us what we’re doing wrong and how we can do a better job.Keep reading here. And please let me know what you think in the Comments!
We’re on our fourth Chartteam Spotlight y’all and it’s about time I show you how the product-side of Chartbeat lives. After all, what’s a tech startup without the various individuals who make our products all that they can be?
What cool things are you working on these days?
There’s a huge education initiative at Chartbeat and within Chartcorps right now, as we look to roll out the next version of Chartbeat Publishing. We’re trying to create the right materials to show our clients how to get the most out of this fiery thing! As part of that, our team has broken down the publishing product function-by-function to understand how and why people are using Chartbeat the way they are, and how we can educate our customers so they can use it even more effectively. It’s been a cool way for me to understand the narrative of Chartbeat’s story with our users in mind: how did the product come to be, and where is it going?
What challenges are you facing as we continue to grow?
My biggest challenge is figuring out how to capture all the knowledge from the wise men who came before me at Chartbeat, and how to leave behind time capsules of knowledge for those who will need to learn about our products in the future (as clients or as Chartteam members).
Sometimes it feels like we’re moving just as fast as the data pours in, and it becomes a race against the very thing we created. How do we learn more so that we can serve our clients needs on a better level? How do we adequately take their feedback and channel it to all of our sharp engineers, designers, data scientists, and product owners so that they can create a better product for our clients’ needs? It’s a race against the pings themselves!
What have you learned working here?
How the rubber meets the road, and where I can be a part of that. I think we’ve all learned this one way or another, but the Chartcorps is in a funny position because we have our hands in many parts of the product, and certainly the client support side of things as well. When you’re put in a situation like this, sometimes your biggest responsibility is to simply make sure the car keeps moving forward – even if it’s not quite within your “title” (whatever that is these days?!) – to do what you need to do to make that happen. This position has taught me quite a bit about organizational structure and how efficient and awesome teams are built. It’s also taught me about what it means to own many little responsibilities, to help keep things going forward.
What do you think one of the best Chartbeat perks are?
The culture of unlimited learning really is the best perk to me. It’s really an awesome thing to be a part of. There are a ton of talented people here, many of whom have specialized skills that make them, to me, the best at what they do out of anyone I know. And when they take time off from being great at what they do, they’re willing to teach someone else a little bit of their talents…they don’t have to…they just do. We have group coding classes, “Startbeat” meetings where we learn about how startups work, guest speakers, lunch n’ learns, learning over beers, learning about beers, learning to make beers, learning to learn. It’s a lot of learning, a lot of fun.
So you see, we’re a hardworkin’ crew here at Chartbeat and our varied backgrounds power this startup every single day. If you’re intrigued, keep an eye on our Jobs page for forthcoming opportunities to join the Chartteam.