It seems like ancient history now, but “big data” was once a hot field with startups, investors, and big companies all buzzing with hype. Then the tech industry moved on, and marketers crowned data science and machine learning the Next Big Things (at least until blockchain takes over).
Of course, big data never really went away—though many companies did. In fact, there’s a lot happening in the field, and some of it is linked to the rise of machine learning and artificial intelligence applications. If you’re looking for a guide to what has happened—and what the future holds—you could do worse than talk to a guy named Justin Borgman (pictured).
Borgman helped build a startup out of Yale University called Hadapt during the big-data boom years of 2010-2012. Hadapt made software that combined advanced databases with the open-source analytics platform Hadoop. His company was acquired by database giant Teradata (NYSE: [[ticker:TDC]]) in 2014, and Borgman became a vice president and general manager there, leading efforts on the open-source software side.
Now, Borgman is on to something new. He’s leading a spinout of Teradata called Starburst, which is based in Boston. The new company, up and running since October, has about a dozen employees. Borgman says it is already profitable and expects to make “several million” dollars in revenue this year, selling to mid-sized and larger companies “across all sectors.”
Starburst is targeting a technical niche, but it’s a big one: providing services, support, and tools for a data analytics system called Presto. Presto was originally developed at Facebook and open-sourced to allow developers to use it broadly. In geek-speak, it’s a “distributed SQL query engine.” That means it allows users to run fast, efficient analytics on multiple types of data, wherever the data live (for example, in Hive, Cassandra, or relational databases). And that means a lot less hassle preparing the data for analysis, which traditionally has been done using the “extract, transform, load” process for managing databases.
Presto is used by big companies such as Amazon, Uber, Twitter, Netflix, and Airbnb (as well as Facebook) to do things like gain insights into user behavior, diagnose problems, and track sales and other results across huge amounts of data in many different formats. Borgman declines to name any of his customers yet, but says many Presto users are also customers of Starburst.
Daniel Abadi, a computer science professor at the University of Maryland, College Park, says in an e-mail that new methods of running queries on database systems “have become important differentiators across different vendors,” in part because of the increasing complexity of these systems. (Abadi was a scientific co-founder of Hadapt but isn’t involved with Teradata or Starburst.)
He adds, “Presto is well-positioned for placement at the forefront of this innovation, as leading tech companies and Presto users… feed back their advanced analysis practices back to the Presto development community and into the open-source project.”
After Teradata acquired his startup, Borgman says, he wanted to join forces with an existing open-source project in data infrastructure to make a big impact. So, his team approached Facebook about getting involved with and contributing to Presto, originally through Teradata.
“We wanted to become the company supporting Presto,” he says.
Now, he gets to do that as part of a smaller, independent entity. That means providing software tools and configurations to make Presto work smoothly for enterprises, as well as adding future capabilities. Starburst has a partnership with Teradata—the startup will support the big company’s customers who use Presto—but Teradata doesn’t have an ownership stake, Borgman says. And he says that (as of a few weeks ago at least) “the board is me.”
You might call it the “no VC, no board, no problem” startup model. But at some point, Starburst will probably want to scale up its operations. And it may need outside help, especially in the world of enterprise software. “I wouldn’t rule out raising capital,” Borgman says, but the company’s current setup “gives us tremendous flexibility.”
In fact, Borgman thinks too much VC money ultimately has hurt the field of big data. “Venture capitalists themselves sort of spoiled the market,” he says. “There were too many players, and business models were subsidized by VC.”
Consider the large amounts raised by the likes of Cloudera, Hortonworks, MongoDB, and Databricks. Smaller startups, like Hadapt, found it very tough to compete in the overcrowded sector, which led to a lot of consolidation and failed startups. “Some [companies] are still limping along, but they’re not on a growth trajectory,” Borgman says. “We were fortunate that we sold the company when we did.” (Meanwhile, Cloudera, Hortonworks, and MongoDB all became public companies, with market caps of over $1 billion each.)
A similar shakeout might happen in today’s market for machine learning