When Data Meets Genomics, Where Does Computing Power Come From?

San Antonio—Even as genome sequencing has risen in prevalence and declined in cost during the last decade, it still has many scientists perplexed from a data standpoint.

As it becomes more common for pharmaceutical and biotech companies to use data from sequenced genomes for drug testing, the question of how will they effectively process that immense amount of information remains. That discussion was core to much of the Big Data & Data Analytics conference hosted this week by the University of Texas at San Antonio. Companies like Janssen, the pharmaceutical division of New Brunswick, NJ-based healthcare giant Johnson & Johnson, have developed internal teams create solutions.

“This is something we need to work with partners in biotech, startups, academia, to enable the effective capture, interpretation and assimilation of this data,” said Guna Rajagopal, the vice president of Janssen’s computational sciences division, whose team does that data work for the drug developer.

For example, two years ago, Rajagopal’s 60-person division took the results of a drug’s effect on 500 people whose genomes it had sequenced. That produced 90 terabytes of data, which Janssen stored with Amazon. Analyzing that data, which Janssen did at the San Diego Supercomputing Center at the University of California, San Diego, required eight weeks and 257 terabytes of computing power, he said. The company also worked with Intel on the project, he said.

Rajagopal didn’t reveal the results, or the name of the drug, citing restrictions from his company’s legal office. The point of his example : As sequencing becomes a more prevalent in drug development, those partnerships among people in different industries are going to become more important, he said.

“The data must flow across all the organizations we have so the right data comes to the right people to make the right decision,” Rajagopal said. “If you’re talking about 1 million genomes or 10,000 genomes, how are we going to address this bigger challenge?”

The conference itself was a confluence of academics and enterprise, of technology and life sciences. At the University of Texas MD Anderson Cancer Center in Houston, researchers in the computational biology and bioinformatics department have started archiving data, which makes it harder to easily access, because of the sheer volume of data they’re bringing in, according to John Weinstein, the department chair.

“It’s harder to get to, and to get to quickly, but at least it will be preserved,” Weinstein said. “The question is, what comes next? I’m sure there are those who at this meeting know what availability [of storage] there will be.”

One group might be another institution at the University of Texas System. The Texas Advanced Computing Center, which is based in the system’s flagship campus in Austin, has built a computing system it calls Stampede that has 100,000 cores (or processing units) of computing power and 14 petabytes of storage (one petabyte is 1 million gigabytes).

“We’ve got this tsunami coming of data. This is where the Texas Advanced Computing Center wants to come to help,” said Niall Gaffney, the center’s director of data intensive computing. “Stampede is the sort of system that might be able to tackle this million genome problem.”

Author: David Holley

David is the national correspondent at Xconomy. He has spent most of his career covering business of every kind, from breweries in Oregon to investment banks in New York. A native of the Pacific Northwest, David started his career reporting at weekly and daily newspapers, covering murder trials, city council meetings, the expanding startup tech industry in the region, and everything between. He left the West Coast to pursue business journalism in New York, first writing about biotech and then private equity at The Deal. After a stint at Bloomberg News writing about high-yield bonds and leveraged loans, David relocated from New York to Austin, TX. He graduated from Portland State University. View all posts by David Holley