One of the Boston tech scene’s most dynamic duos is at it again. Yes, Andy Palmer and Michael Stonebraker are coming out of stealth with their latest company.
Cambridge, MA-based Tamr, formerly known as Data Tamer, makes software aimed at helping big companies manage and connect their many data sources; the idea is to give enterprises a faster way to access the right information to make business decisions.
Tamr also says it has raised $16 million-plus from big-name investors Google Ventures and New Enterprise Associates. Venture capitalists Rich Miner and Peter Barris have joined Tamr’s board of directors, which is chaired by database expert Jerry Held.
The company’s timing could be good, now that a lot of the marketing hype around “big data” has died down. And, more importantly, Tamr seems to be solving a real business problem with some market upside.
The company is led by chief executive Palmer, who was the co-founder and founding CEO of Vertica Systems (now part of Hewlett-Packard). In recent years, he has been an angel investor in companies such as Cloudant, CloudSwitch, and VoltDB. (He made Xconomy’s list of top angels in New England in 2012.)
He and co-founder Stonebraker, an adjunct professor at MIT’s Computer Science and Artificial Intelligence Lab, have previously collaborated on Vertica, VoltDB, and Paradigm4. Stonebraker has helped start other companies in the Boston area, such as Goby (bought by TeleNav) and StreamBase Systems (bought by Tibco).
If there’s a common theme among their startups, it’s big data applied to big-company problems. In Tamr’s case, some more background is in order.
Palmer (pictured) is a rare breed of tech executive who also knows the healthcare world. He ran data engineering at Novartis and served as chief information officer at Infinity Pharmaceuticals, so he understands the “data curation” problem firsthand: think of thousands of bench scientists putting their experimental data into spreadsheets to be analyzed, and the company decision-makers having to sift through all the different data sources and formats. The upshot is that a lot of useful information in the “long tail” never gets looked at.
That’s because the traditional way of accessing databases—known as “extract, transform, and load,” or ETL—requires a programmer to handle each data source separately. The approach may work for a few dozen data sources, say, but it breaks down when you have thousands.
Stonebraker (also pictured), a longtime UC Berkeley professor who developed the Ingres and Postgres relational database systems, saw a way to solve this scalability problem from the bottom up. His experience with the Web startup Goby taught him that if you’re dealing with thousands of data sources—in Goby’s case, scraping 80,000 websites looking for events and attractions—you need to incorporate statistics, and use human experts only when they’re absolutely necessary.
Tamr’s technology is based on collaborative research done at MIT, UC Berkeley, Brown University, Brandeis University, and the Qatar Computing Research Institute. It uses machine-learning algorithms and statistics to integrate a huge number of data streams, with a touch of human-expert guidance to keep the algorithms on track; the owner of a particular piece of data may get an e-mail request for clarification, for example.
Overall, the data integration and curation entails understanding how all records are related to one another, paring down redundant data, flagging up items that have typos, and generally prepping all the information so it can be used downstream. If it works, the result should be a big time and cost reduction for customers—and a lot of added value in data that was previously hidden.
“I’m only interested in game-changing technologies,” Stonebraker says. “Life is short. This is a complete game-changer in the data curation market.”
That remains to be seen. But Tamr is getting some