Innovation Outlook for 2015: Tackling Big Data Variety

In 2015, look for the innovation community to be talking about Big Data Variety: the problem, how to fix it, how to make money off of it.

Companies have invested roughly $3-4 trillion on enterprise software over the last 20 years, with Gartner forecasting $320 billion in 2014 alone. A lot of that investment has gone into single systems and applications—from Oracle and SAP to proprietary enterprise resource planning and product lifecycle management systems to (more recently) Hadoop and Hive. The good news: organizations are sitting on vast reserves of diverse, potentially invaluable corporate data. The bad news: they can’t get at a lot of the value because it’s locked in data silos tied to these systems, applications, organizations, individuals, or all of the above.

Welcome to the world of Big Data Variety. Organizations now want to use this data broadly—along with all external data sources—for analytic applications. But most organizations don’t even know what they have—data sources, entities, and attributes—let alone how to get them to work together at scale to power new insights and discover long-tail business opportunities.

Meanwhile, companies are also investing heavily in big data, which Gartner estimates at $44 billion in 2014. Yet today 85 percent of that big data investment is going toward IT services, not software. In an HBR post, Mahesh S. Kumar wrote that “the disproportionate spending on services is a sign of immaturity in how we manage data,” citing Marc Andreessen’s seminal argument that for each new technology wave, the money eventually shifts to software.

Opportunity is knocking: Clearly, we need innovation in software that radically improves the connection, enrichment, and management of the full volume and variety of an enterprise’s data sources. Most of the high-profile software innovation so far (for example, Hortonworks/Hadoop) has targeted storing and aggregating data. The nastier problem—and the bigger opportunity—by far is connecting data silos semantically at scale, shortening the time to analytics, and discovering the data in an enterprise that can dramatically improve signals in predictive models.

This isn’t a problem that will be solved overnight, and it’s going to get worse for businesses (almost every investment in a new, single-vendor system creates a new data silo). And the cultural changes may be the biggest challenge: realizing that the solution is NOT to throw more IT people or consultants at them. Or even to throw data scientists—the new unicorns/rock stars—at them.

When you think of it, our future depends on the ability to harness Big Data Variety. We need to be able to quickly ask—and answer—big questions. Questions ranging from “How can I get the best price (or the most uninterruptible supply source) on an essential part from my global supply chain?” to “Which of my 8,000 research chemists is furthest along working on a molecule that could accelerate a cure for _____?”

A year from now, I think we will look back at some excellent progress here.

[Editor’s note: To tap the wisdom of our distinguished group of Xconomists, we asked a few of them to answer this question heading into 2015: “What will everyone in the innovation community be talking about a year from now?” You can see other questions and answers here.]

Author: Andy Palmer

Andy Palmer is a serial entrepreneur who specializes in accelerating the growth of early-stage, mission-driven startups. Andy has helped found and/or fund more than 50 innovative companies in technology, health care, and the life sciences. Andy’s unique blend of strategic perspective and disciplined tactical execution is suited to environments where uncertainty is the rule rather than the exception. Andy has a specific passion for projects at the intersection of computer science and the life sciences. Most recently, Andy co-founded Tamr, a next-generation data curation company, and Koa Labs, a startup club in the heart of Harvard Square, Cambridge, MA. View all posts by Andy Palmer