Pharma Showing Interest in Open Systems for Drug Discovery

the software development process. A simple mapping of open source doesn’t exist.

The final product is not a modular piece of software, but a chemical entity, or a device. It must be manufactured at a certain level of quality, and cannot be distributed at zero marginal cost. The legal regimes are those of trade secret and patent, not copyright. And since drugs and devices cannot – yet – be tested in silico, there is a major human component not present in software – the humans who courageously volunteer for study, and their political rights.

So given this reality, where in the process can we make a rapid transition to open source approaches?

We start with the knowledge construction space around human biology. Knowledge is closer to code than anything else in the pharmaceutical value chain: it can be captured digitally, transmitted at zero marginal cost, and it’s not something that payors will reimburse as part of care. There is a massive public investment in the creation of knowledge about health and biology, in the form of data and scholarly papers. And there is momentum at state, federal, international, and institutional levels to expose knowledge for open reuse and recombination (despite some objections and lobbying efforts by knowledge brokers like publishers and scholarly societies).

Perhaps most telling, there is movement from within the industry itself to move towards an open source approach to biological knowledge. In the past five years, three distinct projects have been initiated from within the world’s largest pharmaceutical companies – a group not known for aggressive pro-sharing stances – to create pre-competitive spaces for data sharing and analysis.

1. First, in 2009, Merck spun out the Rosetta Inpharmatics unit into a non-profit organization called Sage Bionetworks (disclosure – I serve on the management team). Sage Bionetworks is focused on the platforms and services required for distributed knowledge creation. Sage distributes not only the knowledge modeling processes built inside of Merck, but technology platforms that allow teams of geographically dispersed scientists to collectively analyze data, that allow the tracking of individual contributions to complex projects, and that allow patients to engage directly in the research process.

Sage Bionetworks’ Synapse platform is the driver for internal research teams publishing more than a paper per month for more than four years as well as theCancer Genome Atlas Pan Cancer Consortium (18 papers in press or published) and the DREAM computational challenges. This is validation that the analysis of data doesn’t need a large company’s walls and support systems. It demonstrates that tasks can be broken into modules, contributions can be tracked and rewarded, and that the outcomes can be integrated into the larger systems of scientific knowledge distribution. All of these are key proof points in the advance of open source methods in the life sciences.

2. A second example is the release of transMart by Johnson & Johnson and Recombinant Data Corporation. tranSMART is an open source knowledge management platform that combines a data warehouse with access to federated sources of open and commercial databases with a dataset explorer that integrates and extends the open source i2b2 application, Lucene text indexing, and GenePattern analytical tools.

tranSMART also enables investigators to search published literature and other text sources to evaluate their analysis in context, and data in the platform is aligned to allow identification and analysis of associations between phenotypic and biomarker data, and it is normalized to conform with CDISC and other standards to facilitate search and analysis across different data sources. transMart was used initially by pharmaceutical researchers in Johnson & Johnson’s Centocor R&D division, and the transMart Foundation recently released a major new version of the software, available under an open source license. transMart has had real success penetrating the industrial knowledge management market with 20+ adoptions.

Taken together, Sage’s Synapse and transMart are evidence of the very real emergence of common platforms – which are a pre-condition for the kind of peer production we associate with the open source metaphor.

3. Third, the community awaits the launch of Project DataSphere from the CEO Roundtable on Cancer. Driven by Sanofi scientists, DataSphere promises a universal platform to share oncology clinical trial data sets among researchers, industry, academia, advocacy, and others in a collaborative effort that aims to transform “big data” into novel solutions for cancer patients. Since DataSphere is not yet released, we cannot examine the inner workings of its technology and governance, but early presentations indicate a model more inspired by low transaction costs than other elements of open source: a consortium to manage and broker access to data subject to both trade secret and privacy protection, with technical connections to platforms for collaborative analysis and knowledge management.

I look forward to their innovator presentation at Partnering for Cures in a couple of weeks.  They will be among the 30 cross-sector programs to present their approach.

These three projects together represent a sea change in the pre-competitive landscape for pharmaceutical development. But it’s notable that each of them focus on the biology. Whether it’s early stage data like the TCGA, or late-stage data like clinical trials, the data is about targets and bodies – not the lead compounds. This is where we’re likely to see the most movement out of industry, and indeed this level of progress would have been unthinkable just a decade ago at the height of the first genomics bubble. But when three industry titans like Merck, J&J, and Sanofi are driving sharing, it’s fair to say the idea has traction.

In coming posts I’ll examine how non-traditional players, including patient groups and access to knowledge advocates, are fighting to bring open systems to the parts of the discovery process that the industry is resisting: clinical trials, lead development and optimization, and novel financing models.

Author: John Wilbanks

John Wilbanks is a data commons expert and advocate who has spent his career working to advance open content, open data, and open innovation systems. He is a senior fellow at FasterCures and chief commons officer at Sage Bionetworks. Wilbanks also serves as a senior fellow at the Ewing Marion Kauffman Foundation and as a senior advisor for big data to the National Coordination Office. Previously, Wilbanks worked as a legislative aide to Congressman Fortney "Pete" Stark, served as the first assistant director at the Berkman Center for Internet & Society, founded and led to acquisition bioinformatics company Incellico, Inc., and was vice president of science at Creative Commons. In February 2013, the U.S. government responded to a We the People petition spearheaded by Wilbanks and signed by 65,000 people, and announced a plan to open up taxpayer-funded research data and make it available for free. Wilbanks received his bachelor of arts in philosophy from Tulane University.