When Seattle’s tech community pulled together last year to help recruit Carlos Guestrin, a standout machine-learning expert and data scientist, to the University of Washington, some people were hoping a hot startup would wind up here as well.
They weren’t disappointed. Guestrin is launching GraphLab Inc. with a $6.75 million investment led by Madrona Venture Group and New Enterprise Associates (NEA). The company is commercializing an open source technology for analyzing enormous, complex datasets.
The GraphLab technology was born five years ago at Carnegie Mellon University, where Guestrin and his group were working on large-scale machine learning algorithms to extract and analyze relationships between entities in multi-dimensional graph datasets.
Different from structured relational databases, these modern datasets consist of nodes that represent objects, and edges representing relationships among them. Today, graph datasets may include hundreds of billions of objects, and power things like social networks, online reviews, and recommendation engines. Facebook, for example, is a huge graph dataset in which the nodes are people, pictures, companies, and other entities, and the edges are the “friendships,” likes, and tags that link them together.
“What’s challenging is how you interact with this data at this scale in a fast way,” Guestrin says.
At Carnegie Mellon, his team ran up against the limits of existing software and “built a little system” of their own—GraphLab—to push the state of the art, he says.
“We threw it out into the open source community, kind of as an afterthought,” Guestrin says.
With no marketing other than academic talks, GraphLab has been downloaded tens of thousands of times, and benefitted from the active engagement of major players in the technology industry, Guestrin says.
GraphLab can grind through graph datasets “orders of magnitude faster than any system out there,” he says. That’s because the underlying machine-learning algorithm is optimized to understand and exploit the structure of a graph database, leading to faster, more accurate analysis, Guestrin explains.
The next step is to make it easier to use for people outside the data science priesthood.
“How can I make the algorithm so robust, so simple to use, yet so accessible and valuable that a company with minimal headcount in this area could get the same kind of value that a company like Google that has hired a huge number of people with this area of expertise can,” he says.
Last year, Guestrin was one of a quartet of high-profile hires by the UW computer science department after a recruiting push that included personal entreaties from Microsoft Research vice president Peter Lee and Amazon CEO Jeff Bezos, and the creation of the Amazon Endowed Professorships in Machine Learning for Guestrin and his wife, Emily Fox, now an assistant professor in the UW statistics department.
“It’s a great example of the triangulation between university research, commercial leadership, and our role in encouraging that… and then helping to bring together this nearly $7 million financing round,” says Matt McIlwain, managing director with Madrona, who is joining the board of GraphLab Inc.
Asked if there was a sense in the tech community that by recruiting Guestrin, Seattle was also recruiting a would-be startup company based on GraphLab, McIlwain says: “We certainly were aware of that potential and were hopeful that something would come together over time. Carlos is a major talent from a computer science perspective, but also one of those special people with those great skills and natural entrepreneurial knack.”
If GraphLab was born at Carnegie Mellon and raised by the open source community, it went to finishing school on Montlake. Guestrin says the technology has benefitted from “a tremendous amount of engagement, contribution, and value from the UW community.”