competitive sector. A bunch of startups like Mountain View, CA-based DNAnexus, Redwood City, CA-based Bina Technologies, and others, have popped up with similar ideas for ways to efficiently store genomic data and interpret it. What makes Curoverse’s system different, Berrey says, is that it runs on open-source software, rather than via a proprietary system where “you’re totally dependent on a single vendor.” Curoverse will manage a website, arvados.org, that researchers can tap into from anywhere and use to share information. This means that a geneticist could ask a question about data that sits in his or her own data center, a neighboring lab, and elsewhere simultaneously, without having to physically move any of them around, Berry says.
“The different model of data sharing—you don’t move the data, you move the computations around—and the open-source strategy are the two things that are very different,” he says.
To be clear, Curoverse doesn’t specifically “own” Arvados in the proprietary sense—since it’s an open source platform, anyone can use and download the source code, or computer instructions behind it. But Curoverse will set up, operate, manage and maintain it, and charge users for the amount of computational resources and data storage they use (Berrey wouldn’t say how much). Berrey likens the approach to how Acquia has commercialized Drupal’s open source content management system, or similarly what Rackspace is doing with the OpenStack open source cloud operating system.
”[It’s] complex, time consuming, and requires specialized system administration and operations skills,” Berrey says. “Curoverse will provide products that make it turnkey to use Arvados without having to deal with any of the challenges associated with configuring and managing your own systems.”
Today, Curoverse is only running a private beta version of that service—a private cloud used for genomic analysis at Harvard with 300 terabytes of storage on two clusters, or data centers. Its first product, expected to be available next year, will be a platform-as-a-service, or a hosted and managed version of Arvados. Curoverse aims to then sell a set of products that enable companies and organizations to deploy clouds using Arvados.
Curoverse ultimately hopes to tap three markets. This coming year, it’ll target clinical researchers at major medical centers. It’ll then move on to pathology and independent genetic testing labs using next-generation sequencing for diagnostic tests. The big dream is for doctors someday to use Arvados as a precision medicine tool to treat their patients better—say, by using an application that picks out a specific drug based on the patient’s genetic profile.
Of course, that’s a ways away. Competition aside, Berrey acknowledges that the company is going to have to work hard to get its foot in the door and convince big medical organizations to change, and “understand the value” of using big-data computing over traditional methods. But he’s hoping that that tipping point is coming.
“The good news is there’s so much new data being generated,” he says. “The systems that are installed now aren’t ready for those new data.”