Untying the Knots in Big Data and Big Biology: Q&A with Andrew Su

data from more than 300 researchers around the world in an effort to develop computational methods to identify factors that promote or resist neurological disease.

In its announcement, the NIH said it also provided related grants that are focused on data discovery, career development in biomedical data, and the development of big data courses and open educational resources. The NIH says its “Big Data to Knowledge Initiative” (BD2K) is projected to have a total investment of nearly $656 million through 2020, pending available funds.

Su’s research is focused on using quantitative methods in biomedical discovery. He answered some questions about the BD2K initiative by e-mail. A lightly edited transcript of our exchange is here:

Xconomy: The NIH says these grants are intended to make it easier for biomedical scientists to analyze and use genomic, proteomic, and other complex biomedical data sets. How could such data be used to treat patients in a clinical setting?

Andrew Su: There are definitely some of these BD2K proposals that seem to have direct clinical benefit. I don’t want to speak for them, but I think the overall program goals will become clearer when we have a meeting of all consortium principal investigators in Washington, D.C., next month.

X: Do you see opportunities for commercialization of new technologies arising from the work done under these grants?

AS: There is commercialization potential, both for our grant and the BD2K program as a whole. We will be developing a variety of technologies for proteomics—better identification of post-translational modifications, modeling spatiotemporal dynamics, correlating to genetic variants, and relating to cardiovascular disease.

On the other hand, there is a significant portion of our proposal that focuses on using crowd-sourcing and citizen science to organize biomedical knowledge. In those cases, the knowledge bases that result will be entirely free and open to all. While those won’t be directly commercializable, we hope that it will be a foundation on which other efforts (both commercial and non-commercial) can build.

X: Can you give me some examples of bottlenecks that make it hard to apply genetic data in the diagnosis and treatment of patients?

AS: Annotating the functions of genetic variants is a big one. We are very good at identifying the presence or absence of variants in a given patient, but picking out the variant or variants that are driving disease is still very difficult. This is due to a combination of things—the absence of functional data, and poor organization of the functional data that has been generated.

X: It doesn’t seem like $32 million is going to go very far if you’re trying to solve the problems that make it hard to use big data in a clinical setting. Is parceling out these small grants to dozens of research centers the most efficient way to address these bioinformatics bottlenecks?

AS: You touch on the classic debate of top-down NIH-driven programs versus bottom-up investigator-driven proposals. I don’t think there’s a provable right or wrong answer here, but there are certainly passionate voices on both sides.

If you’re asking whether BD2K will yield tangible benefits in four years, just based on the people involved, I’d be shocked if it didn’t. There are some top-notch people participating (both awardees and NIH).

X: How does the Scripps Wellderly Genome Resource fit into the BD2K program? Who owns that data? Is the owner willing to share/collaborate by allowing other researchers to access that database?

AS: With any Big Data initiative, the quality of the output depends a lot on the data used as input. In particular, having large, high-quality reference data sets is incredibly valuable. The Wellderly study, for example, tells us a lot about what genetic variants are and are not likely to be functional or deleterious.

Author: Bruce V. Bigelow

In Memoriam: Our dear friend Bruce V. Bigelow passed away on June 29, 2018. He was the editor of Xconomy San Diego from 2008 to 2018. Read more about his life and work here. Bruce Bigelow joined Xconomy from the business desk of the San Diego Union-Tribune. He was a member of the team of reporters who were awarded the 2006 Pulitzer Prize in National Reporting for uncovering bribes paid to San Diego Republican Rep. Randy “Duke” Cunningham in exchange for special legislation earmarks. He also shared a 2006 award for enterprise reporting from the Society of Business Editors and Writers for “In Harm’s Way,” an article about the extraordinary casualty rate among employees working in Iraq for San Diego’s Titan Corp. He has written extensively about the 2002 corporate accounting scandal at software goliath Peregrine Systems. He also was a Gerald Loeb Award finalist and National Headline Award winner for “The Toymaker,” a 14-part chronicle of a San Diego start-up company. He takes special satisfaction, though, that the series was included in the library for nonfiction narrative journalism at the Nieman Foundation for Journalism at Harvard University. Bigelow graduated from U.C. Berkeley in 1977 with a degree in English Literature and from the Columbia University Graduate School of Journalism in 1979. Before joining the Union-Tribune in 1990, he worked for the Associated Press in Los Angeles and The Kansas City Times.