Genomic Advances of the 2000s Will Demand an Informatics Revolution in the 2010s

like ribosomal RNA were discovered long ago and shown to be responsible for translating protein-coding RNAs into protein, completely new classes of non-coding RNA have been discovered that are widespread and have been shown to have regulatory roles for entire networks of genes associated with disease. In fact, one particular class of non-coding RNA known as microRNA has not only been well demonstrated to affect processes that cause disease, but is now being pursued as a way to treat it as well. Despite hundreds of thousands of copies of some microRNAs existing in our cells, it was not until this past decade that we discovered these molecules and their effect on critical biological processes.

4. Third-generation DNA sequencing will enable greater insights about underlying biology.

Technologies brought to market in the last decade have enabled amazing discoveries, but they have also shed light on how much we still don’t know and need to learn in order to develop more effective strategies for preventing and treating disease. In order to truly make a difference to improving patient care, scientists need access to fast, accurate and comprehensive snapshots of the underlying biology of living systems. One of the more impressive technologies developed this past decade toward this end was single molecule, real time (SMRT) sequencing.

SMRT sequencing was invented by a group of scientists at Cornell University and is now being developed and commercialized by Pacific Biosciences (a biotechnology company formed by Stephen Turner and some of his colleagues from Cornell University, which I joined this year as chief scientific officer). The technology employs waveguide transmission below cutoff technology to directly observe the activity of DNA polymerase as it sequences DNA. This technological advance enables the observation of nature’s own amazing sequencing engine as it very rapidly sequences DNA. Observing DNA polymerase as it sequences DNA stands in contrast to the heavily engineered second generation systems that have relied on brute force approaches to sequencing rather than nature’s own highly evolved and efficient approach.

SMRT sequencing will enable sequencing of an individual’s complete DNA sequence very quickly and for little cost over the next decade. For example, current technologies take roughly one hour to sequence a single letter from a fragment of DNA, whereas SMRT sequencing can sequence roughly 20,000 letters of the fragment in the same period of time. The system has been designed to observe many of these DNA polymerase molecules at the same time, sequencing many fragments simultaneously, which will ultimately enable the observation of hundreds of gigabases of DNA per hour. This level of unprecedented speed and efficiency in genome sequencing is expected to finally make personalized medicine a reality.

5) Needed: Informatics innovation to translate the data deluge.

Third-generation technologies will enable sequencing every individual in large populations and that will create unprecedented amounts of data, rivaling all other areas of science with respect to quantity and complexity. So the real challenge in the next decade will be informatics based. How will petabyte scales of complex data be managed and integrated so that predictive models of disease can be constructed and routinely applied? While companies like Google routinely play in the space of petabyte scale data sets, the problem they have solved is far simpler than understanding how all DNA variations, RNA levels and isoforms, metabolites, and proteins interrelate to one another across all of the different environments that give rise to life.

Only by marrying information technology to the life sciences and biotechnology will we realize the astonishing potential of the vast amounts of biological data we will be capable of generating. Such data, if properly integrated and analyzed, will enable personalized medicine strategies that lead to every one of us making better choices on how we not only treat disease, but prevent it altogether.

[Editor’s Note: This is part of a series of posts from Xconomists and other technology leaders from around the country who are weighing in with the top innovations they’ve seen in their respective fields the past 10 years, or the top disruptive technologies that will impact the next decade.]

Author: Eric Schadt

Eric Schadt is the director of the Mt. Sinai Institute for Genomics and Multi-Scale Biology in New York, and the chief scientific officer for Pacific Biosciences, a company developing new gene sequencing technologies. He is also a founding member of Sage Bionetworks- an open access genomics initiative designed to build and support databases and an accessible plaform for creating innovative dynamic disease models. Dr. Schadt joined Pacific Biosciences in May 2009 from Rosetta Inpharmatics, a subsidiary of Merck & Co., Inc. in Seattle, where he was Executive Scientific Director of Genetics. Dr. Schadt's work at Rosetta involved the generation and integration of very large-scale sequence variation, molecular profiling and clinical data in disease populations to construct the molecular networks that define disease states and link molecular biology to physiology in ways that can impact clinical medicine. Dr. Schadt has contributed to a number of discoveries relating to the genetic basis of common human diseases such as diabetes and obesity, which have been widely published in leading scientific journals. His research has provided novel insights into what is needed to master diverse, large-scale data collected on normal and disease populations in order to elucidate the complexity of disease and make more informed decisions in the drug discovery arena. Prior to joining Rosetta, Dr. Schadt was a Senior Research Scientist at Roche Bioscience. He received his B.A. in applied mathematics and computer science from California Polytechnic State University, his M.A. in pure mathematics from UCLA, and his Ph.D. in bio-mathematics from UCLA.