In the Face of Genomic Data Challenges, the Cloud Keeps Us Afloat

The data intensity of genomics is so intense that it’s difficult for most people to comprehend. According to estimates published in the journal PLOS Biology in July, as many as 2 billion human genomes could be sequenced by 2025, which would far exceed the data output expected of other “big data domains,” such as astronomy, Twitter, and YouTube.

This enormous surge in sequencing, driven largely by the promise of personalized medicine, will produce close to one zettabyte of data per year within the next decade. Already in 2015, we’ve surpassed a petabyte, which is 1,000 terabytes of digital data storage. As a point of comparison, all written works of mankind since the beginning of recorded time is believed to comprise about 50 petabytes worth of text. A zettabyte (with 21 zeros) is exponentially larger, or roughly the equivalent of 1 billion terabytes.

To put this in perspective, if one byte of data were equivalent to a grain of rice, 1 kilobyte of data would be a cup of rice, a megabyte would be eight bags, a gigabyte would be three trucks worth, a terabyte would be two container ships of rice, a petabyte would blanket Manhattan, an exabyte of rice would cover all West Coast states and a zettabyte of data would fill the entire Pacific Ocean.

In this analogy, one person’s genome amounts to a half container ship full of rice.

All of this sequencing data we’ll be producing is an excellent thing, as long as we have the computing power to handle it.

We’ve already seen that the more detailed and rich data we have about the world’s population, the more knowledge we have to improve our health and the health of our planet. We can make correlations between disease characteristics and DNA mutations that lead to targeted medicines and better preventative care. Just look at IBM’s Watson, which is not only beating humans on Jeopardy! but is now diagnosing disease and recommending the best course for treating cancer in mere minutes.

But to truly capitalize on genomics’ potential—at the scale that will be needed in our not-so-distant future—we must use computing tools far more powerful than what we’ve relied upon in the past. Moore’s Law, which dictates that computer processors will double in power every two years, is coming to and end. The limitations of physics simply won’t allow for more power from our existing computing infrastructure.

Enter the power of hybrid cloud computing.

The hybrid cloud consists of both high-performance computing solutions for onsite use, combined with “the cloud,” where data users can

Author: Pieter van Rooyen

Pieter van Rooyen is the founding CEO of Edico Genome, a San Diego-based company that developed the first next-generation sequencing Bio-IT processor, which rapidly and cost-effectively analyzes large amounts of genomic data. Throughout his career, Pieter has consistently brought to market disruptive technologies that help advance human well-being. He has more than 20 years of experience inventing, developing, and commercializing technologies in a range of industries, including semiconductors, wireless communication, healthcare, life sciences, image processing, and retail automation, and holds 110 granted patents in these areas. His passion for bringing innovative technologies to the masses has led to the creation and funding of numerous start-ups that have brought significant return to investors. Prior to Edico Genome, Pieter was involved in the burgeoning mobile health industry, helping develop a mobile phone technology that enables health care delivery in underdeveloped communities. He also co-founded ecoATM, (acquired by Coinstar) whose network of ATM-like machines lets consumers recycle their personal electronic devices for cash, and Zyray Wireless (acquired by Broadcom). With Edico Genome, Pieter’s goal is to overcome a key bottleneck in the DNA sequencing workflow to meet the needs of clinical genomics and usher in the new era of precision medicine. Pieter holds a doctorate in electrical engineering from University of Johannesburg in South Africa. He has authored more than 50 peer-reviewed papers on topics ranging from digital communications to bioengineering.