The Official Klout Blog

Archive for the ‘engineering’ Category

Keeping you Updated on the Score

Monday, September 26th, 2011

Here at Klout, we process terabytes of data every day to help you understand and leverage your influence. We have our own internal metrics to verify that every network is being processed correctly and scoring runs smoothly. We know that processing this data correctly is part of what makes people trust Klout.

This weekend we experienced a bug with our pipeline for processing data for LinkedIn, and this resulted in a score drop for about 0.001% of our scored population which lasted approximately eight hours on Sunday. We know that even though the number of affected users isn’t large, it’s a big deal to anyone who is relying on the score and we take it very seriously.

We’re working to ensure we keep you, our users, updated about any issues we experience and work to resolve them quickly. We’ve setup a dedicated @KloutStatus twitter account, so you can follow the updates as they happen

We’re always working to improve our processes and we have big improvements coming very soon. If you think you can help, we’re hiring.

Posted in engineering, measuring influence | 41 Comments »

SXSW: Measuring Klout with Distributed Computing

Tuesday, August 23rd, 2011

Klout collects and processes an enormous amount of data to measure online influence — over 19 terabytes every day! To handle the ingestion and storage of this data, we’ve turned to open-source, distributed technologies. At next year’s SXSW we want to share the technical challenges and best practices we’ve found when using open source technology.

I’d love for you to vote for our panel where I’ll be joined by some of Klout’s thought leaders in this area. Ramya Krishnamurthy leads our science team and oversees the development of algorithms for topic detection, scoring, and ranking. Tyke Lewis leads our consumer teams and has been instrumental in our adoption of Node.js to provide a scalable platform to deliver realtime streams. Derek Wollenstein, a key member of our platform team, builds massively scalable systems to ingest, analyze, and deliver Big Data.

You will leave this workshop with tips on some of the shortcomings of open source technologies and actionable knowledge on how to build your own Big Data pipeline.

Posted in engineering, measuring influence | 2 Comments »

Klout Welcomes Dave Mariani as VP of Engineering

Monday, May 16th, 2011

Here at Klout we believe we have one of the most exciting engineering challenges you can find on the web. We are building page rank for people and the amount of data we analyze is staggering. On a daily basis we calculate the Klout Score for over 75 million people. To do this we:

  • Ingest and semantically analyze 100+ million tweets, Facebook status updates and LinkedIn updates on a daily basis
  • Measure interactions across over 4 billion social graph edges each day
  • Analyze over 6 billion status updates to understand which topics of the nearly 1 million in our ontology, a user is influential about
  • Process over 50 other variables that are features in our scoring algorithm
  • Serve 100′s of millions of API calls out of our own API to our 2000+ partners

We love this challenge but know that to continue being the standard for measuring online influence, we have a lot of work ahead of us. With that in mind, we are very pleased to announce that David Mariani is joining the Klout team as Vice President of Engineering (see the Techcrunch post).

Dave is a proven winner with big data experience. Most recently Dave served as Vice President of User Data and Analytics at Yahoo. While at Yahoo Dave managed engineering for all of Yahoo’s audience and advertising analytics platforms where they process 30+ billion user and advertising events per day (>20TB/day) to improve customer engagement on Yahoo! properties while driving better advertising yields. Dave joined Yahoo through the $300M acquisition of Blue Lithium where he served as CTO.

Dave’s goal at Klout will be to continue building a world class engineering team and culture. Even though we currently process an amazing amount of data, we are also challenging Dave to add more services for us to analyze and to do this analysis faster and with more granularity, transparency and actionability. This is a huge challenge but we couldn’t imagine a better person for the job!

Posted in announcements, engineering | 82 Comments »

Engineering Influence

Monday, May 16th, 2011

Our goal is to be the standard for influence. The advent of social media has created a huge number of measurable relationships. On Facebook, people have an average of 130 friends. On Twitter, the average number of followers range from 300+ to 1000+. With each relationship comes a different source of data. This has created A LOT of noise and an attention economy. Influence has the power to drive this attention.

When a company, brand, or person creates content, our goal is to measure the actions on that content. We want to measure every view, click, like, share, comment, retweet, mention, vote, check-in, recommendation, and so on. We want to know how influential the person who *acted* on that content is. We want to know the actual meaning of that content. And we want to know all of this, over time.

Measuring influence is a bit like trying to measure an emotion like hate or jealousy. It’s really hard and takes a boatload of data.

A huge part of what we do is develop machine learning models that make sense of this data. On top of that, there’s an endless amount of this data and we need a platform to ingest, prepare, and analyze it.

The two biggest platforms are Facebook and Twitter, but it hardly ends there when it comes to social media. There’s LinkedIn, Foursquare, Path, Youtube, Quora, and many others. This presents the challenge of creating models for each platform and building data analysis platforms that can handle unstructured data.

To handle this at Klout, we’ve turned to open source technologies.  We rely on Cloudera’s CDH3 Hadoop distribution for analysis and many of our data services. Another exciting open source technology we’ve recently embraced is Node.js.  Node.js provides incredibly fast performance and asynchronous event processing, all at a massive scale. This is important to us as our products scale and get more and more realtime.

Twitter Influence

Twitter was the natural selection for our first network to analyze due to the open nature of the data as well as the simplistic nature of actions you can take on Twitter, such as a mention or a retweet.

However, as our models matured, the growth of Twitter increased. As of this post, our Twitter cluster has the following stats:

  • 75 million people scored daily
  • 4 billion graph edges scored daily
  • 48 million people are influenced by or influence an average of 27 people
  • We derive hundreds of thousands of different topics that 14 million users are influential
    on
  • On average 5 topics per user using NLP and semantic analysis
  • For topics, 3 months of mentions and retweets are analyzed, currently over 6 billion

Klout's Twitter Analytics

Twitter Analytics Overview

From the twitter firehose, data is written to disk in buffered chunks. A mapreduce job handles the task of preparing the firehose data into different buckets needed for each of the workflows. These different workflows serve different products from performing bot and spam detection to scoring to topic extraction.

Many of our mapreduce jobs are written in java, but we also rely on Pig Latin for some purposes such as performing simple joins are population aggregates and statistics.

Oozie is used to coordinate the different workflow components. To serve out data both internally and externally, we dump out raw csv files or load this data into HBase which interfaces with load balanced API servers.

Klout's Twitter Scoring Workflow

Twitter Scoring Workflow

We use a machine learning and statistical based approach to perform our scoring. This model currently has over 35 features. The scoring workflow consists of different Oozie jobs, many of which perform feature extraction. In the final jobs of this workflow, all the features are fed into the scoring model, which produces scores.

We’ve experimented with Mahout in the past and we will be using more of it in the future.

Our mission to measure influence is nowhere near complete. Luckily, here at Klout we believe in taking on the biggest challenges, and that is just what we are doing.

Do you have any questions or comments? Let us know! Also, if you want to take on these challenges with us, consider joining.

This post is by our CTO, Binh Tran, and adapted from a post for Cloudera’s blog.


Posted in engineering, measuring influence | 57 Comments »

From Hackathon to Market – Klout for Chrome (beta)

Thursday, February 17th, 2011

Recently, we had a joint hackathon with the teams at bit.ly and Klout – it rocked.

Combine great energy, awesome developers and a company culture that values rapid innovation and what do you get? Products like Klout for Chrome (beta release). Using your Chrome browser, you can install the Klout extension by clicking here.

Klout for Chrome places a Klout influence score next to the people you follow when using http://twitter.com. From the moment you install the Google Chrome browser extension you’ll see Klout influence scores in your stream.

We’ve found it particularly useful when applied to Twitter lists. Now you can check out people’s relative influence within any Twitter list. Or conversely, it’s fun to build new Twitter lists based on people’s Klout scores.

Enjoy! This is a beta release, we looking forward to hearing your feedback.

Posted in engineering, other | 225 Comments »

The Science of Influence: SXSW Panel

Friday, August 13th, 2010

Expressing oneself via interactions with others is the essence of being human. The explosion of social media sites like Twitter, Facebook has opened up new modes of interaction for people to express themselves. In the vast domain of social media, some individuals act as key trend setters and influencers for many to follow.

Using machines to analyze and quantify online behavior/interaction patterns allows us to identify these trend setters and understand what makes them influential.
Designing systems to determine these influencers has been an area of fascination for me.

This why I’ve suggested a panel at SXSW discussing the Science of Influence – a topic that’s becoming ever more important in the online (and offline) world. I’d love to hear your thoughts on this panel idea and any suggestions on who else you’d like to see on it.

Thanks for your time and please do, vote for the panel, if you like the idea. Thanks!

Also, stay tuned for a post on a panel organized by Joe Fernandez, founder of Klout – Influencers Will Inherit the Earth. Quick, Market Them!

Posted in announcements, engineering, measuring influence | 186 Comments »