The data intensity of genomics is so intense that it’s difficult for most people to comprehend. According to estimates published in the journal PLOS Biology in July, as many as 2 billion human genomes could be sequenced by 2025, which would far exceed the data output expected of other “big data domains,” such as astronomy, Twitter, and YouTube.
This enormous surge in sequencing, driven largely by the promise of personalized medicine, will produce close to one zettabyte of data per year within the next decade. Already in 2015, we’ve surpassed a petabyte, which is 1,000 terabytes of digital data storage. As a point of comparison, all written works of mankind since the beginning of recorded time is believed to comprise about 50 petabytes worth of text. A zettabyte (with 21 zeros) is exponentially larger, or roughly the equivalent of 1 billion terabytes.
To put this in perspective, if one byte of data were equivalent to a grain of rice, 1 kilobyte of data would be a cup of rice, a megabyte would be eight bags, a gigabyte would be three trucks worth, a terabyte would be two container ships of rice, a petabyte would blanket Manhattan, an exabyte of rice would cover all West Coast states and a zettabyte of data would fill the entire Pacific Ocean.
In this analogy, one person’s genome amounts to a half container ship full of rice.
All of this sequencing data we’ll be producing is an excellent thing, as long as we have the computing power to handle it.
We’ve already seen that the more detailed and rich data we have about the world’s population, the more knowledge we have to improve our health and the health of our planet. We can make correlations between disease characteristics and DNA mutations that lead to targeted medicines and better preventative care. Just look at IBM’s Watson, which is not only beating humans on Jeopardy! but is now diagnosing disease and recommending the best course for treating cancer in mere minutes.
But to truly capitalize on genomics’ potential—at the scale that will be needed in our not-so-distant future—we must use computing tools far more powerful than what we’ve relied upon in the past. Moore’s Law, which dictates that computer processors will double in power every two years, is coming to and end. The limitations of physics simply won’t allow for more power from our existing computing infrastructure.
Enter the power of hybrid cloud computing.
The hybrid cloud consists of both high-performance computing solutions for onsite use, combined with “the cloud,” where data users can