There is a major transformational step underway for managing the growing amount of human genomic data. To date, the focus has been on amassing databanks of genomes and then developing new tools to analyze this information. In essence, the emphasis has been on breaking bottlenecks for analyzing the data.
Now, there is an opportunity to take progress in a new direction, to move beyond amassing genomic data and enable researchers to share genomic knowledge worldwide, and ultimately, at the point of patient care. This next era will dramatically change how genomic data can be accessed, shared, and interpreted on a global scale.
The demand for distributed, global access to information is evident from our day-to-day interaction with data. With access to data in nearly everything we do—from using Google Maps for directions to getting the weather forecast—we know that the norm is to get information in real-time, from a browser interface. The ability to do so with large sets of genetic data represents the next wave of progress in the genomic era.
Many of us in this field are captivated by the opportunities to use genomic insights to advance medicine. So, let’s take a look at the progress made in managing these data for clinical use:
Early genome sequencing and analysis: The concept of genomics in medicine was catalyzed by the Human Genome Project in the early 2000s, after which the research community, governments, and industry spent nearly a decade understanding how to sequence genomes, find useful information, and build equipment to facilitate more efficient processes. The progress led to a host of important discoveries about the genome and its impact on disease risk and treatment response. I personally was involved in these early pioneering efforts, working with deCODE Genetics on a population-scale platform in Iceland that has compiled the largest collection of whole-genome variation data in the world. Even back then, that genomic engine identified scores of important genetic variations associated with common diseases, groundbreaking discoveries that laid the foundation for gauging the inherited risk of conditions like prostate cancer and heart attacks.
Mainstream sequencing genomic data: With more experience and success with sequencing, industry leaders set out to make genomics vastly more accessible with improved technologies that made sequencing cheaper, faster, and easier. We’ve now reached the threshold of the $1,000 genome—enabling major centers to integrate sequencing and use genome-guided information to advance their research and generate new ideas about genomic causes to a range of challenging diseases.
Analyzing genomic data: Harnessing this progress, we have witnessed in recent years a whole new generation of genomics players, with a range of companies introducing new analytics tools and software to aid with sequencing and analysis. This flurry of activity has generated new techniques with a range of purposes—from identifying single variants to broad patterns—that can impact everything from a single rare disease to an entire therapeutic category. While many of these tools will continue to be used for niche applications, some (particularly those that are successfully scaling up) will likely become industry standards that can manage large-scale datasets and support broad, cross-institution genomic research in the near future.
Amassing genomic datasets and liberating them: Today, as organizations amass vast amounts of genomic data, they are looking for ways to share this information and work with one another to consolidate the data and generate reliable, consistent conclusions about disease causes, risks, and responses. Based on these efforts, new databases are emerging that catalogue genetic variation, offering invaluable “Big Data” style resources for the world’s research community. Just two weeks ago at the American Society of Human Genetics’s annual meeting, several new