The Official Klout Blog

Archive for the ‘engineering’ Category

How Klout Turned Big Data Into Giant Data

Thursday, October 11th, 2012


Last November, I wrote a post called Big Data, Bigger Brains. In that post, I wrote about how we were able to make business intelligence (BI) work in a Big Data environment at Klout. That was a big step forward for Klout, but our work wasn’t yet done. Recently we launched a whole new website and a new Klout Score that substantially upped the stakes.

Really Big Data

At Klout, we need to perform quick, deep analysis on vast amounts of user data. We need to set up complex alerts and monitors to ensure that our data collection and processing is accurate and timely. With our new Score, we increased the amount of signals we collected by four times. This means that we now collect and normalize more than 12 billion signals a day into a Hive data warehouse of more than 1 trillion rows. In addition, we have hundreds of millions of user profiles that translate into a massive “customer” dimension, rich with attributes. Our existing configuration of connecting Hive to SQL Server Analysis Services (SSAS) by using MySql as a staging area was no longer feasible.

Bye-Bye MySql

So, how did we eliminate MySql from the equation? Simple. We leveraged Microsoft’s Hive ODBC driver and SQL Server’s OpenQuery interface to connect the SSAS directly to Hive. Microsoft’s Kay Unkroth and Denny Lee and myself wrote a whitepaper detailing the specifics here. Now, we process 12 billion rows a day by leveraging the power of Hadoop and HiveQL right from within SSAS. For our largest cube, it takes about an hour to update a day’s worth of data – yes, 12 billions rows worth. By combining a great OLAP engine like SSAS with Hive, we get the best of both worlds: 1 trillion rows of granular data exposed through a interactive query interface compatible with existing business intelligence tools.

What I really want for Christmas…

So, what’s wrong with this story? What I really want is the OLAP engine itself to reside alongside of Hive/Hadoop, rather than live alone in a non-clustered environment. If the multi-dimensional engine resided inside HDFS, we could eliminate the double-write (write to HDFS, write to the cube) and leverage the aggregate memory and disk available across the Hadoop cluster for virtually unlimited scale out. As an added benefit, a single write would eliminate latency and vastly simplify the operational environment. I can dream, can’t I?

Do you want to take on Big Data challenges like this? We’re looking for great engineers who can think out of the box. Check us out.

Posted in engineering | 8 Comments »

Klout Gets Hacking

Friday, October 5th, 2012


The end of the quarter is the perfect time to take a day (or two or three) to celebrate the awesome work completed the past 90 days. And what better way to celebrate than…more work?! Well, a hackathon to be precise.


The Front End team discussing their Hackathon project

For those unfamiliar, a hackathon is a contest where developers (designers, marketers and product-minded folk are also welcome) work for a designated amount of time—24 hours or a weekend usually—to bring a new product, software, etc., from inception to prototype. It’s a bit of an engineering tradition, picking up steam in startups and particular industries, many of which host internal and external events and award prizes to the winners.


Klout team working on their code and design during the Klout Hackathon

Klout hosted a 24-hour internal hackathon for Klout employees to work on pet projects, innovations, and new ideas using our usual product roadmap. While we can’t promise that you’ll see any of these work their way on to Klout.com, we’re excited to find a place for many of the projects in the future. The turnout was impressive: 17 teams of one to four people worked on ambitious projects. Here are a few of them:

The winning app was an Influence Landscape Visualization by Keith Walker. Keith took a concept we’ve talked about many times, and have seen a few of our Partner Developers implement on our API in the past: plot the influence graph of a user and make it a visual experience. An influencer’s connections are displayed as dots of increasing magnitude relative to their Klout Score. You can drill down on any of these connections to then see that influencer’s connections, and so on and so forth down the rabbit hole.

Our second winning hack looked at surfacing new insights from our recent Klout Moments feature. Casting them through the lens of location and your immediate network, this team (consisting of Mao Ye, Mark Azevedo, Sreevatsan Raman, and Adithya Rao) actually conceived this as two separate projects, then realized they could easily mesh into one. Moments can be viewed by city, allowing you to view the top influential content in San Francisco. Curious what the people you influence are saying? Their recommended moments surface their top content.

The rest of the submissions were no less notable, including heat cloud visualizations, sentiment analysis, scientific discovery of movers and shakers, real-time backend tracking, an influential business radar, a new scoring system for career growth, and a uniquely polished presentation for affiliate marketing for bars and clubs. Even our CEO, Joe Fernandez, piloted a group to success, offering a new take on creating a street team for mobilizing your influencer network to drive change and get the word out utilizing our notification system and circles.

We plan to continue a new tradition of this each quarter, and know that each time the quality of work will be top-notch and find its way into the Klout experience. All of the teams did a fantastic job and showed a diversity of approach and vision while also cohesively rallying around concepts we’ve collectively considered for some time now. We couldn’t be more proud.

Posted in engineering | No Comments »

Scaling the Klout API with Scala, Akka, and Play

Tuesday, October 2nd, 2012


Back in March, Felipe Oliveira wrote about Klout’s new Sexy API. We had just released the Scala Play! Framework API infrastructure that we had been writing the previous few months. Not only did it represent a big step forward on the tech side, but it was also an important cultural change for Klout. Previously, disparate teams were responsible for their own serving infrastructure; now, having a central platform has empowered Klout to scale to a billion API requests per day and export powerful new functionality to partners.

But we still had a lot of work to do back then. By now, six months after launch, we’ve made some serious improvements to the API’s scalability and availability using Akka’s rich toolset for concurrent programming. Though Akka is mostly famous for its implementation of the Actor Model, I’m going to talk about two other Akka features, Futures and Agents.

Scalability with Akka Futures

For some background on the scalability problems we face, consider that serving a simple profile page like mine (see below), requires hundreds of lookups to several different datastores. Because of the scale of Klout’s data pipeline (expect a future blog post by Sreevatsan Raman to shed more light), we need to store users’ Scores, Moments, Topics and other data all in different datastores. As such, our app is very IO bound and optimizing our IO usage was one of our biggest priorities. We needed to do our IO concurrently.

dyross

Akka Futures (soon to be part of the Scala Standard Library) have proven to be the ideal tool for concurrent work. A Future represents an asynchronous computation and many Futures can be created in parallel. The Future API in Akka is very rich, but the key for us is its monadic nature. If you don’t know what a monad is, in Scala, it is something that has the map and flatMap methods. This allows Futures to be composed into new Futures with the syntactic sugar of a for expression. Compare this to Java Futures (java.util.concurrent.*), which have no means of composition.

Consider the following example, which has three methods that call different datastores and each return a Future:

[gist id=3815934 file=futuresSetup.scala]

Additionally, we have a resulting type we’d like to combine the results into:

[gist id=3815934 file=profile.scala]

Now, how should we do this? The non-monadic way, similar to how we would do it in Java, is to start each of the tasks, wait for them, and then build the result:

[gist id=3815934 file=futureBad.scala]

This is not ideal for a few reasons:

1. We are blocking on the execution of the concurrent tasks, which means the thread running this code must wait idly, wasting resources while the app is making network IO.
2. It is rather verbose and difficult to maintain.
3. This function violates the Single Responsibility Principle, because it is responsible for both the waiting of the Future and business logic for combining the results.

A better way to do this is with Future composition:

[gist id=3815934 file=futureGood.scala]

Notice how much more readable the code is. The for expression is sugar for calling the map and flatMap methods on the Futures, and the benefit is that we can refer to the results of the Futures in the yield block without waiting for them. This makes the method read more like a workflow and it is no longer concerned with waiting for the completion of the tasks.

One difference between the two methods is that the first returns a raw Profile and the second returns a Future[Profile]. This leads to an important realization, in the form of simple rules, we had while adding Futures to our code:

1. All methods that do IO should return a Future
2. Never block a thread waiting on a Future
3. Therefore, all methods that call other methods that return Futures must themselves return Futures.

In this way, we use Future almost like an IO monad (see this post for an introduction to functional IO). This allows us to push the Futures all the way up our call stack, finally wrapping them in Play’s AsyncResult in controller methods. Play handles these results in a non-blocking way, so we can be as efficient with IO as possible. (See the Play documentation for more detail).

Overall, the strategy of using Akka Future allows us to write more efficient and more readable code. I suggest becoming very familiar with the methods on Futures and the different ways to compose them, especially since they will be shipped with Scala 2.10 and later. The ability to write concurrent IO so easily is the key to our API’s performance and scalability.

High Availability with Apache Zookeeper and Akka Agents

Another one of our learnings in the last six months is that dynamic service discovery is key. For example, our MySQL cluster has one master and several slaves, and we spread read requests across the slaves as much as possible. Sometimes we need to dynamically remove slave nodes from the pool because of degraded performance or scheduled maintenance. Since we a large production cluster of API nodes, we need a to be able to make updates without re-deploying or downtime, giving our clients the best experience possible and guaranteeing that all nodes update within seconds.

Apache Zookeeper was an obvious solution for distributed configuration. To start, we created a simple wrapper for ZooKeeper on top of Twitter’s ZooKeeper client written in Scala. We use this wrapper to watch ZooKeeper nodes, issuing a callback inside our application whenever a modification happens:

[gist id=3815934 file=zk.scala]

But where should these callbacks go? One solution would be to create a service like this:

[gist id=3815934 file=stateBad.scala]

This service would keep track of nodes to read from, and we could use the zookeeper client to call updateState on the service. Any read request for MySQL would use the service to determine the pool of nodes to read from.

Again, there are a couple of problems with this:

1. The same class that deals with business logic is responsible for making updates, so the interface is not safe. Clients of this service would be able to make changes when we only want Zookeeper callbacks to make these changes.
2. This service would be prone to concurrency issues when multiple threads are making updates. These issues amplify exponentially if we want to add more features to our “MySQL State” than just “up” or “down”.

At first, we thought that Actors would be a good solution for this problem. However, we soon learned that because actors process all messages one at a time, the reads would get backed up. Also, the interface to the actors is Futures, which is more complex than necessary.

Akka thankfully provides an implementation of Agents, based off of the concept by the same name from Clojure. Agents wrap an instance of some type of state and support asynchronous single-threaded updaters and synchronous getters. For an agent of type T, the updater is a function from T => T, and the agent updates its state by changing its value to the result of the function applied to it’s previous value.

Here’s a similar implementation of the service above but using an Agent:

[gist id=3815934 file=stateGood.scala]

As you can see, the service is much simpler and the interface hides the updating access. Also, because the update functions are applied one at a time and simply add or remove nodes from the set, there is no concurrency issue. The biggest win, however, is that this allows us to think about our state as an immutable data structure, only responsible for dealing with business logic, and wrap the updating logic in the Agent. This gives us an elegant structure to our code.

We like this pattern so much that we use it wherever we have dynamic configuration depending on Zookeeper. It’s a great abstraction that allows for both more reliable and more readable code. And having this mechanism for dynamic service discovery gives the API fault tolerance and high availability it needs to meet its SLA.

Even if you are not using the Actor Model, Akka provides many tools to improve large concurrent enterprise systems. In some cases, other abstractions are simpler to use and require less boilerplate than Actors. At Klout, we believe in using the right tool for the job, so to help the Klout API meet SLA, we have used Future and Agents heavily. In doing so, we hope to push more and more of Klout’s data into the world. Let us know if you want to help.

Posted in engineering | 14 Comments »

The Stories Behind the Score

Friday, August 24th, 2012


Last week Klout released a major update to our score model and a preview of our new moments feature that showcases your most influential social media activity. Internally the codename for this project was “Maxwell”, for the Scottish physicist James Clerk Maxwell, whose discoveries helped pave the way for modern physics.

The Maxwell project brought together a team of engineers, designers and product managers across Klout in a months-long effort to evolve the Klout Score and rethink how people can benefit from the insights Klout enables. We’ve gathered stories from a handful of the members of the Maxwell project for you here.

Big Data, Big Challenges
Andras Benke: I’m the technical leader of the Maxwell data and science team. We built up a big data pipeline based on a brand new architecture and completely replaced our scoring system.
Andras

Adithya Rao: I was responsible for building the scoring models for each network individually, and combining them together into the final Klout score. This also included extracting information from all the networks, as well as real world sources and using them meaningfully to count towards the score. Continuously tuning the models to be accurate, while meeting users’ expectations, was probably the most challenging problem I faced during the project.
Adithya

Nemanja Spasojevic: I worked on the backend data pipeline. We kept up with increasing demands for data quality as the project progressed, and with a high pace of iterations given the size of data. Processing and re-processing big data to make sure we kept up the development pace required, while still maintaining the legacy pipeline, required a lot of nerves from the team.
Nemanja

Girish Lingappa: During development and testing, we had to dig through millions of users’ scores and billions of messages to debug issues. Sometimes it felt like looking for a needle in a haystack.
Girish

David Ross: I am responsible for the data serving infrastructure for the Score and moments. including designing fast, scalable, and reliable API endpoints. Moments is the most challenging serving problem Klout has ever tackled. If you think about the amount of data that needs to get served to millions of users, moments is quite the scalability challenge. After many whiteboard sessions and iterations, we are making it happen.
Dave

Jerome Banks: I built out most of the data pipeline. Technically we had to deal with large datasets while still remaining agile. 
Jerome

Designing the Vision
Matt Sperling: The biggest challenge from a design standpoint was shifting the focus from the score to the story behind the score. Our solution was to show people the social interactions impacting their score on the dashboard, while giving them a showcase of their best moments on their public facing profile. Hence the design of interaction and moments. Alongside this we did a pretty substantial redesign of the site.
Matt

Mark Azevedo: I was responsible for building the original moments user interface prototypes. Now that we are no longer in the prototype phase, we will continue iterating on moments as a platform for revealing influence in your everyday social interactions. The Klout.com team who worked on the site redesign made it possible to support moments as a product.
Mark

Maxwell in the Real World
Adithya Rao: I am proud to be part of a team which was able to crunch such huge amounts of data and create a product that is hopefully going to redefine how people consume social data on the internet.

Nemanja Spasojevic: In the end, its the users that really matter. Being able to deliver on such an ambitious goal is the dream of any engineer.

Matt Sperling: I’m thrilled that we are giving Klout users a profile they can be proud of. It’s something they can show off no matter what their score.

Andras Benke: We are processing an amazing amount of data every day. Creating a stable system which is able to do this is pretty challenging even with today’s big data technologies.

What’s Next
The launch of the updated Klout score and moments is just one step on the road to helping everyone discover their influence. We’re excited about this release but we’ve already begun work on more great new features to come.

Posted in design, engineering | 29 Comments »

Sexy API from Klout

Tuesday, March 13th, 2012


Klout had an incredible 2011, and as the end of the year approached I felt so proud to be part of such an amazing team! We accomplished a lot, so the holidays came at a great time; it was a much-needed opportunity for the team to catch a break and have some fun with our families. At the same time, I was excited about all the incredible things we were gonna have a blast building. So when January 2nd came along, I was back at Klout HQ ready to go; as is customary at Klout, there was a fun challenge ahead. We’ve been hard at work for the past few months and we’re excited to share with you what we’ve built.

The challenge was to re-engineer Klout’s API to sustain its rapid growth and massive traffic—ten billion API calls a month and growing—and also to empower our clients with all the features currently available on Klout.com.

So I started by concentrating on the problem at hand; I find it much more productive than jumping into the solution or any implementation detail, going against our instinct as engineers. There’s a small select group of companies serving that type of massive API traffic but a peculiar number screams out of the infographic above. Klout is the only one with less than seven hundred employees; we actually only have about 76 troopers! And what does that number represent? That number tells me that as fast as our API should be, we need to be moving even faster as a team–we need killer productivity! Besides, everyone likes to go fast!

Let’s Play!

As seems to be a recurring fact in my life—see more on Why Did I Fall in Love with Play! Framework?—the technology stack chosen was the always awesome, lean-and-mean, super-duper productive Play! Framework. As I mentioned on my previous blog post, Find Your Klout, Play was designed to provide a powerful, easy to extend infrastructure, it uses fast non-blocking IO, and it uses a stateless model that makes horizontal scaling a cakewalk. There is such a joy that comes every time my terminal sings “play new klout-* –with scala”. And with this type of traffic, our API might just be the most heavily used Play! Framework application to date!

Our API has Swagger

We were obviously going to create a RESTful API, but there are some components of SOAP (Simple Object Access Protocol) that have a lot of value. Even though I wanted no part of that business, in any shape or form, new or old, and its verbose XML format (remember the productivity factor) the WSDL (Web Service Definition Language) does provides features that aren’t commonly seen in REST. The WSDL allows client proxies to be automatically generated (wsdl2java comes to mind); it also allows developers to create client interfaces easily. How could we accomplish that without deep-diving into thousands of lines of XML? Swagger son! Here’s how Wordnik defines their marvel: “Swagger is a specification and complete framework implementation for describing, producing, consuming, and visualizing RESTful web services.” Swagger allows our API to have a well-defined contract in JSON, the same format we are using on every single endpoint.

“Talk is cheap. Show me the code.” – Linus Torvalds

  • First of all, add the dependency on Swagger Play to your conf/dependencies.yml and run “play deps”. Then go ahead and define your data transfer class.
  • Now it’s time to define the controller. All the @Api* annotations are provided by Swagger. The code you are about to see doesn’t do much other instantiating the Score class and setting its value to 100 (I wish that was my Klout score).
  • We created an API trait to define all the business logic used on every endpoint. This trait provides an api method which we wrap each controller action with; this method does the validation, JSON transformation, and it tracks number of calls and response times with StatsD using David Ross’ super useful StatsD module. Stay tuned in our blog for another post on StatsD and our monitoring infrastructure soon.
  • Now it’s time to define our REST-friendly route for our endpoint, a walk-in-the-park with my beloved Play! Framework. To do that add the following line to conf/routes.
  • Caching is being done using Memcached, re-using Play’s support for it. Jay Taylor, Klout Perks’ lead engineer, wrote a nice wrapper for it.

We’re excited that the Play framework is now commercially supported by Typesafe, along with Akka and the Scala, all of which are firing on all cylinders at Klout. Building on a modern foundation like the Typesafe Stack makes it much easier for our development team to punch above its weight!

If you are down to Play! come join us and follow us on Twitter at @_felipera, @dyross, @ladlestein.

Unlock Your Klout!

Posted in engineering | 12 Comments »

Find Your Klout

Friday, December 9th, 2011

At Klout, we love data and as Dave Mariani, Klout’s VP of Engineering, stated in his latest blog post, we’ve got lots of it! Klout currently uses Hadoop to crunch large volumes of data but what do we do with that data? You already know about the Klout score, but I want to talk about a new feature I’m extremely excited about — search!

Problem at Hand
I just want to start off by saying, search is hard! Yet, the requirements were pretty simple:  we needed to create a robust solution that would allow us to search across all scored Klout users. Did I mention it had to be fast? Everyone likes to go fast! The problem is that 100 Million People have Klout (and that was this past September—an eternity in Social Media time) which means our search solution had to scale, scale horizontally.

So how did we accomplish that?

Share Nothing and Don’t Block
We use Node.js in our front end to help scale to thousands of concurrent users.  We follow the same philosophy in our backend for search. Given the size of our dataset and its substantial growth rate, we needed to choose a search solution which would allow us to scale horizontally; On the application side we wanted to have a stateless Web layer, not only for performance, but also for manageability. So share nothing and block as little as possible!

Let’s Play! and be “cool, bonsai cool”
The technology stack chosen to address the problem was ElasticSearch and the Play! Framework. Why did we choose that stack? At Klout, we like to choose the right tool for the job, regardless of the platform it runs under or the company that’s behind it.  We chose ElasticSearch and Play! because both of these were designed to use fast, non-blocking IO, both of these provide powerful infrastructure, and both of these were designed to be easy to extend.  These tools help us build powerful search now, and continue improving search to give you more relevant results.

ElasticSearch is a powerful, scalable and distributed search solution built on strong foundations like JBoss Netty and Apache Lucene. ElasticSearch builds off of Apache Lucene, a personal favorite of mine, created by Doug Cutting.  Doug Cutting has had a huge impact on many tools we use at Klout;  He is also the creator of Hadoop (and Nutch for that matter!).  Lucene is a search library—more than 10 years old—that provides powerful search capabilities such as relevancy ranking, fuzzy matching, wildcard, proximity operators, fielded searching, spell-checking, multi-lingual and all that jazz—all while still being completely portable since it’s a JVM-based solution; most important, it’s blazing fast!

ElasticSearch uses JBoss Netty as its network library for async/non-blocking IO.  In a traditional blocking IO model, performing a search across multiple shards would be extremely expensive.  We could retrieve results serially, meaning that our search would become slower as our data size increased, or execute results in parallel threads, which would require ever increasing processing resources.  Netty allows ElasticSearch to retrieve results from multiple search nodes in parallel; there are no blocking threads waiting for it to finish.

We used Play! Framework for the Web layer, which also uses JBoss Netty as its network library. Why? To find out more about this great framework, watch my Dreamforce presentation from this past September here in San Francisco, CA: “Introducing Play! Framework: Painless Java and Scala Web Applications”. Just recently, Play! has joined Typesafe, the creators of Scala, as an official part of its Scala-based technology stack and providers of the Web solution for Scala.

Akka is also part of Typesafe’s stack and provides an event-driven and self-healing concurrency platform based on an Erlang-style, actor-based concurrency model for the JVM. In summary, Akka helps Klout’s search go fast! We have actors for the different searches we support, messages are dispatched to their mailboxes as Play’s controller actions are invoked. Akka actors, which are pretty similar to Scala actors, allow us to effortlessly execute parallel searches to minimize overall response time to provide our users the best experience possible.

If you are down to Play! come join us and follow us on Twitter as @_felipera and @dwollen.

Happy Searching!

Posted in engineering | 13 Comments »

Big Data, Bigger Brains

Thursday, November 3rd, 2011

The web continues to produce a deluge of data and signals about users and their activities online.  At Klout, we are doing our best to translate these signals into a reliable measure of user influence and reach.  Like many other startups and a growing number of large enterprises, Klout uses Hadoop to crunch large volumes of data using clusters of commodity servers.  At Klout, we capture and process over 3 billion signals a day and Hadoop is an excellent, horizontally scalable platform for just doing just that.

What Ever Happened to Business Intelligence?

Besides ingesting and processing big data, we also need to deliver deep data analytics for our internal and external customers.   We need to uncover traffic and engagement trends for improving our consumer experiences at Klout.com while driving ROI for our advertisers and business partners.  Hadoop, by its nature, is a batch processing system and is not yet suitable for delivering interactive, “anything by anything” queries.  But there is hope for those who need to support business intelligence workloads for Big Data sets, using open source software and inexpensive hardware.  Hive, which runs on Hadoop, exposes a SQL interface and presents a relational “database” view of your data on top of your raw Hadoop files.  But Hive can’t support real time, highly interactive business intelligence queries because it still behaves like a batch processing system, generating MapReduce code to do its work.

A Business Intelligence “Index” for Hive

At Klout, we developed a unique solution to this problem that leverages Hadoop’s scale and cost effectiveness while delivering “speed of thought” ad hoc queries.  The key was to create a multi-dimensional query “index”, or cube, that sits in front of Hive and serves realtime, ad hoc, “anything by anything” queries using the upcoming version of Microsoft SQL Server Analysis Services (code name “Denali”).  Yes, you read it right.  We use aWindows based multi-dimensional (MOLAP) product from Microsoft to load 350 million rows of Hive data per day and achieve an average query response time ofunder 10 seconds on 35 billion rows of data.  All queries are served by a single, $7,000 server with internal RAID storage.

An Unholy Alliance?

With all of the specialized business intelligence appliances on the market (Vertica, Green Plum, Aster Data, Netezza, Teradata), why did we pick Microsoft SQL Server Analysis Services (SSAS)?  Because it’s a full featured business intelligence engine, it’s in active development, it’s inexpensive, has widespread query tool support, great documentation, and it scales.  How much does it scale?  At Yahoo!, I deployed SQL Server Analysis Services to support our display advertising business that ingested 3.5 billions rows a day and delivered average query times of less than 7 seconds on 500 billion rows of data.  Just as important, SSAS provides a true business view of data to end users in the form of a cube with measures and dimensions, hiding the complexities of SQL and delivering a rich semantic layer on top of your raw, unstructured Hadoop data.  Hive can do what it does best by providing a Cloud-based, inexpensive, centralized data warehouse, while SSAS manages all the data aggregates to support realtime ad hoc queries.  In fact, to deliver equivalent, performant cube functionality using traditional SQL databases would require generating thousands of data aggregates, creating complexity and making system changes onerous.

What’s Wrong With This Picture?

There’s only one problem with this solution.  There doesn’t yet exist a direct connection between Hive and SSAS for loading a cube.  So, we use Sqoop to load each day’s data into MySql, essentially using MySql as a staging area for loading data into the cube.  This solution works and scales pretty well, but it introduces unnecessary data latency.

What’s Next

We’re working with Microsoft to develop direct connectivity between Hive and our cube using their just announced support for Hive and Hadoop.  By eliminating the staging server, we will reduce our data latency dramatically and eliminate a key dependency.  The good news is that you can do what we did here at Klout right now, for no cost. You can download CTP3 of SQL Server Code Named “Denali” for free by downloading it here or by using your MSDN account.  Stay tuned for further updates.

Do you want to take on Big Data challenges like this? We’re looking for great engineers who can think out of the box.  Check us out.

Posted in engineering | 68 Comments »

A More Accurate, Transparent Klout Score

Wednesday, October 26th, 2011

Today we’re releasing a new scoring model with insights to help you understand changes in your influence. This project represents the biggest step forward in accuracy, transparency and our technology in Klout’s history. Joe shared the full vision behind these changes in his post last week.

Influence is the ability to drive action and is based on quality, not quantity. When someone engages with your content, we assess that action in the context of the person’s own activity. These principles form the basis of our PeopleRank algorithm which determines your Score based on:

  • how many people you influence,
  • how much you influence them and
  • how influential they are.

We analyze 2.7 billion pieces of content and connections daily. Reaching this scale, we’ve introduced significant upgrades to our platform, allowing us to handle this explosive growth. Now, we can add more networks and other sources of your influence much, much faster.

Insights help you understand why your Score changed. Each day, you can see which subscore and people in your network caused that change. You can also view insights on your friends’ profiles.

These changes are a significant milestone in the Klout Score’s evolution and you can continue to expect more improvements in the future. As always, your opinion is very important to us and we’d love to hear your feedback.

How will this affect my Score?

A majority of users will see their Scores stay the same or go up but some users will see a drop. In fact, some of our Scores here at the Klout HQ will drop — our goal is accuracy above all else. We believe our users will be pleased with the improvements we’ve made. Below is a distribution of the Score changes. You’ll note large decreases in Score are rare.

Posted in announcements, engineering, measuring influence | 1,902 Comments »

A New Era for Klout Scores

Wednesday, October 19th, 2011

More than three years ago the Klout Score was born in my bedroom in New York City as a way to make sense of the noise I was seeing in social media. I could share my opinion about anything, instantly, with the people who trust me and the data was available to measure my impact. Fast forward to today and we now have over 3,500 companies using the Klout Score to reward influencers with Klout Perks, give better customer service, reward loyalty, recruit, and much more.

The biggest change in the past three years is that (thankfully) we have people way smarter than me spending each and every day improving the algorithms that calculate the Klout Score. I am incredibly proud of the work the team has done and I am excited to announce the biggest improvement to the Klout Score in our history is launching next week.

People Rank
We’ve often thought of what we’re doing as a form of PeopleRank and this is a giant step in that direction. We’ve improved the stability and accuracy of our scores. Furthermore, our subscores have always been an important part of Klout. This update will make them more clear and make changes easier to understand.

True Reach
True Reach is the number of people you influence. It is a real number of people we find by looking at the impact you have on your connections. We analyze over two and a half billion connections and pieces of content every day in order to accurately gauge who is in your true reach.

Amplification
Amplification is how much you influence these people. We analyze how many people in your potential audience act upon your content. We take this a step further and understand what an influence signal means in the context of that person. For instance, if I rarely like or comment on anyone’s posts, but choose to do so to yours, that is more meaningful than if I like 60 posts a day. Amplification indicates the effect you have on your audience.

Network Impact
Network Impact is the influence of your audience. This is on a 1 to 100 scale and indicates the influence level of people who engage with your content. It’s not just about how many people you reach, it’s about getting your message to the right people. Having more connections won’t help your Network Impact, but having influential connections will.

Accuracy & Transparency
The subscores contribute to one overall score, the Klout Score. We’ve always been transparent about the various activities that could impact your Klout Score but we now have the power to share the specific actions that are helping or hurting your score. When your Klout Score changes you will be able to match it to a corresponding change in one of these subscores and understand why the change has occurred. If your Score goes up because more top influencers are acting upon your content, we will share that with you.

Influence is the Ability to Drive Action
The core premise behind our algorithms has always been that influence is the ability to drive action. We have tightened this concept even further in this release. You are not more influential because you tweet or use Facebook more, you are influential because you have an influential audience engaging with your content.

The Standard for Influence
With thousands of companies and millions of people leveraging the Klout Score, we take our role as the standard for measuring influence incredibly seriously. We are very early in what we view as a long journey. The team here at Klout is thrilled about the challenge ahead of us and are completely dedicated to creating the most accurate measurement of influence in the world. To that end, you can expect the way we measure influence to continue to evolve as behaviors change on the social web or as new networks like Google+ emerge. The majority of the time these changes will be incremental and invisible to most people, but this world changes fast and occasionally you can expect us to make significant changes like the one we are launching next week.

And of course, I know you want to know…

How will this affect my Score?
A majority of users will see their Scores stay the same or go up but some users will see a drop. Some of our Scores here at the Klout HQ will drop (including mine) — our goal is accuracy above all else. We believe our users will be pleased with the improvements we’ve made.

This is a project that’s been under development for over three months, and, in many ways, over the three years since Klout started. We appreciate your trust and support and we can’t wait to hear what you think. We will let you know when this new model goes live next week and will continue to work to provide the deepest and most accurate insights into your influence possible.

Posted in announcements, engineering, measuring influence | 358 Comments »

The Tech Behind Klout.com

Tuesday, October 4th, 2011

In May, we unveiled the current version of Klout.com. Not only did we put the site through a major visual/UX/UI redesign, we rewrote the entire front-end web application from the ground up. Here’s an overview of the tech that drives the user experience of Klout.com.

The Old Stack
Klout.com is a data-driven company, and our first product was the Klout API. We repurposed a lot of the early API code (PHP) to power the first iterations of Klout.com on the LAMP (Linux/Apache/MySQL/PHP) stack to allow users to access their own scores. As Klout’s user base began to grow, scaling the performance of the website was getting in the way of creating interesting new features. We knew we would have to refactor the site to keep up with traffic and to create a flexible foundation for feature development. With carte blanche, we wanted to do something innovative and new while having fun with high-performance web technologies. Inspired by #NewTwitter and the work of JavaScript luminaries like Ryan Dahl and Yehuda Katz, we chose to go where few had boldly before: a fully JavaScript-based web application.

Enter Node.js
We were intrigued by the nascent server-side applications of JavaScript. Google’s V8 engine was incredibly fast, and Ryan Dahl’s node.js was the missing evented I/O layer that finally made running JavaScript on the server a reality. Packages like Express made handling routing and content negotiation seamless. In our tests, a single node.js process was able to handle thousands of concurrent connections in a very CPU-efficient manner. Plus, using JavaScript on both the server and the client made writing multi-purpose code very straightforward. We knew of other companies using node.js at the time, but most were using it to serve APIs. It seemed nobody was brave (or crazy) enough to serve an entire website with it… yet.
Betting everything on such a young technology was a risk, but in the spirit of trailblazing and entrepreneurship, Klout wanted to help prove out this very promising architecture. Ryan Dahl and Joyent helped us over hurdles in our early development and listened to our feedback. As a result, Klout is running on node.js in 16 instances across two servers in our own data center, serving tens of thousands of concurrent users.

The Client-Side Application
Most of Klout.com actually runs in your own browser — we use Backbone.js to provide a MVC (model-view-controller) structure on the client side. After your first page load, your browser begins talking to our node.js application servers to get bits of information in JSON (JavaScript Object Notation), such as scores, topics, and chart data. As you navigate around the site, only the relevant, changed portions of the page refresh and redraw. We use Yehuda Katz’s Handlebars.js for minimal, semantic templating and LESS CSS for dynamic stylesheet programming. All of this adds up to a very snappy user experience on Klout.com.

Open Source
We depend on many open-source technologies, including Node.js, connect/Express, Backbone.js, Handlebars, Underscore, jQuery, and Redis. Klout is grateful to the open-source community and is committed to contributing back to projects that have helped us build such a rich and modern web application.
Klout continues to evolve as this new stack evolves, and we’re excited to be at the forefront of what we believe to be the future of web application development. Each stable release of each component of our stack brings new performance and stability enhancements. Staying on the bleeding edge of web technology is a huge effort, but the end result is a constantly improving experience for our users.

If you want to set the standard for the future of web application engineering while helping the everyone in the world unlock their Klout, join us!

Posted in engineering, measuring influence | 38 Comments »