Big data is a catchphrase that’s getting stale and almost too broad to be meaningful. But behind the buzzword are some very real technology trends—and they paint a good picture for Seattle.
Matt McIlwain, managing director at Madrona Venture Group, the region’s foremost investor in information technology companies, gave me a quick tour through the layers of what he calls the “dataware” technology stack—a rough outline of the interlocking technologies businesses are using to derive value from the promise of big data. The Seattle area has every reason to own large pieces of it.
“Each of these layers are very big and in the next 10 years they’re going to play out as really interesting categories,” says McIlwain, who will offer more insights on the topic at Xconomy Xchange: Beauty and the Data Beast—Seattle Innovation Stories on Tuesday, Nov. 18.
McIlwain will be joined on stage by Carlos Guestrin, a University of Washington machine learning expert and founder and CEO of GraphLab, a company right in the middle of the dataware stack.
But let’s start at the bottom, with the enabling infrastructure. Companies are refining new forms of data storage that can handle huge volumes and varieties of data, from tabular data to text to images. Several of these open-source technologies come from custom systems built inside Internet giants to handle their own needs, including Google and Facebook, and are now becoming broadly available.
Cloudera, MapR, and Hortonworks are building commercial software and consulting businesses on top of the Hadoop and MapReduce technologies for storage and processing. MongoDB is commercializing a document database technology of the same name. Other modern data management and processing systems include Cassandra and Spark.
We’re obviously lumping a lot into this infrastructure layer for simplicity’s sake. Suffice it to say this is where the data lives. But that “where” increasingly means thousands of distributed servers in the public cloud-computing datacenters of Amazon Web Services and Microsoft Azure—provided by Seattle’s twin technology pillars—and their competitors.
“What’s really magical in this world of dataware is that you can take that data and clean it and normalize it, and then use it on the fly in increasingly intelligent ways,” McIlwain says.
One level up the stack from the enabling infrastructure is the data intelligence layer, McIlwain says.
That’s where the likes of Madrona portfolio companies GraphLab and Algorithmia, another Seattle startup, are focused.
“They’re trying to help you create, essentially, data models and what are called data pipelines, such that you can ultimately ingest that data that’s living down in the infrastructure and get some kind of intelligence out of it, to build some kind of a predictive system or … or just a better model for you to have more insight that you can make decisions against,” McIlwain says.
The big cloud providers are starting to offer machine learning as a service, too. Earlier this year, Microsoft introduced Azure Machine Learning, and Amazon is expected to follow suit, connecting predictive analytics directly to their cloud storage services.
These data pipelines might output to visualization tools from the likes of Seattle-based Tableau Software or Power BI from Microsoft.
Or they might be pushed directly into an application, where the output triggers an action, such as recommending a movie to watch on Netflix, queuing up the next song in Pandora, or showing relevant homes for sale on Zillow.
These data-infused applications form the top layer of the dataware stack, McIlwain says. And there are hundreds of examples, such as Workday, which last week rolled out predictive analytics applications that it says can look at a company’s historic data to identify a top-performing employee at risk of leaving in the future.
That’s an example of a vertically focused, data-infused application, McIlwain says. These are companies that have essentially built their own dataware technology stacks or licensed pieces from other vendors to serve specific industries. Another example is Bellevue, WA-based Apptio, which gives companies a data-driven view of their IT operations so they can better manage them as a business, says McIlwain, who sits on Apptio’s board. Likewise in online marketing, companies are moving from the marketing automation provided by the likes of Eloqua, Hubspot, and Marketo, to marketing intelligence from Infer, 6Sense, and Bizible, a Seattle-based company that is also in Madrona’s portfolio.
The companies at the data intelligence layer, meanwhile, are providing tools that are broadly applicable across industries. GraphLab, for example, helps teams at companies including ExxonMobil and Adobe build prediction or recommendation systems. Algorithmia aims to be a place for discovering, sharing, and licensing algorithms such as those that make up the data pipelines.
McIlwain notes that Seattle’s dataware assets lie not just in the startups and technology giants, but also in researchers at the Allen Institute for Artificial Intelligence, headed by machine learning expert Oren Etzioni, and at University of Washington, where the already deep bench of big data and machine learning talent was bolstered by new hires in natural language processing earlier this year.