Little Insights About ‘Big Data’

the term “big data” with “big enough data” made much of what I was trying to learn make more sense. It isn’t the absolute size of data that is important, it is having enough data to get the results you need.

Having said that, there are a number of exciting new technologies that are fueling impressive breakthroughs in the management and analysis of data. Right now these tend to be platforms that mostly fall into the “Curate” category (see below), since this is what is necessary to support the new analytics solutions yet to come.

The most helpful insight I had was breaking down the big data product landscape into three rough categories. Keeping with the fashion of the alliterative triple used with the “Three V’s” of big data (Volume, Velocity, Variety), I created my own “Three C’s” of big data product categories.

—Collect: Products that participate in the acquisition of data, including the hardware (sensors, networks, CPU, memory, disks) and pre-collected data from third-party information suppliers.

—Curate: This is the bulk of current offerings in big data and involves platforms to administer vast amounts of data. Hadoop is probably the most famous, but any file system, database, cloud, or stream-processing platform that can store, move, process, or filter data that doesn’t fit on a single machine falls into this category.

—Consume: The ultimate purpose for using any big data solution is to produce a result that allows a company to more confidently take actions to achieve its goals. To this end, we have traditionally used business intelligence, visualization, and statistical tools to analyze data. Now these tools are being adapted to operate on a much larger scale.

The new technological area in this category is machine learning (aka machine intelligence), with the aim of automating the discovery of patterns in data that are too complex or too large for humans or traditional approaches to tackle.

Few products in the big data space fall squarely into one of these three categories, and they should be best thought of as part of a spectrum.

One last minor insight was around talk of analyzing unstructured data. I was puzzled: If your data had no structure, what possible analysis could you do? By its very nature analysis involves structure.

What is actually being claimed is that new techniques are now available to take in data that has loose structure (e.g., log files) or implicit structure (e.g., natural language) and can extract that structure rapidly and at scale, making it available for analysis in a time frame where it is still useful.

Hopefully some of these little insights about big data will help you understand more of what you are hearing about the exciting new technologies, products, and companies involved in extracting value from the massive data stores and streams the world has available today.

Author: Art Mellor

Art Mellor is a software engineer at Skelmir, which develops Java-language virtual machine technology to help customers bring their products to market. From 2012 to 2015, Mellor was CEO of Zero Locus, a Milwaukee startup now operating as Functor Reality that creates predictive analytics software for large data sets using probabilistic graphical models. Mellor has spent more than 25 years in the startup world, having founded or co-founded four startups in the technology space and one biotech nonprofit, and worked at three other technology startups. His previous startups include a venture-backed ISP network configuration company, Gold Wire Technology; a boot-strapped network protocol test company, Midnight Networks; a computer and training consultancy, THINK Consulting; and the world's largest multi-disciplinary, open-access biorepository for multiple sclerosis, Accelerated Cure Project. He has advised numerous startups as a mentor, adviser, or board member; written hundreds of articles, newsletters, and book chapters; and has been a regular speaker for entrepreneurial classes at MIT, Harvard, Babson, Olin, and other schools.