The practice of encoding data in DNA molecules could be inching closer to graduating from research labs to finding practical commercial use.
In the coming years, the explosion of data being generated by computing devices could outstrip the supply of hard drives needed to store it, some industry experts say. Some academic researchers and business leaders think that the solution could be to house information in lab-made DNA molecules instead of silicon. DNA’s advantages include a much longer shelf life and superior ability to pack information into less space.
But there are a number of hurdles that must be overcome before DNA becomes a feasible method for mass data storage, including how slow and expensive it is to encode data in manufactured DNA. Catalog Technologies, a young startup based at the Harvard Life Lab, says it has developed a process for encoding data in DNA that addresses both of those issues. The company announced today that it has raised $9 million so far from investors, including New Enterprise Associates (NEA), to help commercialize its DNA-based data-storage method.
“We’re really excited about making this a viable solution in the near future, starting with next year some pilot projects,” says CEO and co-founder Hyunjun Park in a phone interview. He declined to share names of pilot customers, but he says Catalog is receiving interest from government entities, nonprofits, and data storage companies.
The idea of storing data in DNA isn’t new, but the field seems to be gaining momentum as it finds faster ways of encoding and reading ever-increasing amounts of data in DNA. Researchers have completed experiments in the past few years demonstrating the ability to encode digital data—the sequences of zeroes and ones that comprise computer text, images, and audio files—into strands of synthetic DNA, made up of sequences of nucleotide bases represented by the letters A, C, G, and T. Storing data in DNA involves converting digital data into DNA code, and then synthesizing strings of DNA molecules with that code. For the past several years, researchers have been working to speed up the DNA synthesis step and lower the cost.
In 2012, researchers that included renowned Harvard professor George Church reported that they encoded a 5.27-megabit book in DNA and read it using a DNA sequencing machine. Since then, demonstrated storage capacity has grown. In a paper published in February, Microsoft and University of Washington researchers reported that they stored 35 distinct digital files in DNA—more than 200 megabytes of data—including the United Nations’ “Universal Declaration of Human Rights” in more than 100 languages and the music video for OK Go’s song “This Too Shall Pass.” The Microsoft and UW team also said they improved methods of retrieving that data.
This month, Church and several other researchers published results of a new method that uses a DNA-building enzyme instead of traditional chemical approaches to rapidly synthesize DNA. UCLA assistant professor Sri Kosuri, a synthetic biology researcher who didn’t work on the project, tweeted that the approach might help improve the speed and cost of DNA synthesis, as well as the speed of reading the encoded information. (The new paper has not been peer-reviewed.)
Catalog says it has developed faster and cheaper methods of building custom DNA for data storage purposes. The startup says the key to its approach is separating the process of synthesizing DNA molecules from the process of encoding the digital data. Park says Catalog’s method involves purchasing large quantities of small DNA fragments—about 20 to 30 base pairs long—from synthetic DNA suppliers. Catalog designed a machine that can dispense and stitch the DNA fragments together in programmable ways. The idea is that Catalog’s process uses a relatively small number of DNA molecules—fewer than 200—which can be combined in an exponential number of ways, Park says. The process requires less DNA synthesis, which is the “expensive and slow part of the work,” he says.
Victor Zhirnov, chief scientist of the nonprofit Semiconductor Research Corporation in Durham, NC, says it sounds like Catalog is using a so-called “library approach,” which involves “encoding information by taking a combination of DNA molecules from a defined lexicon of molecules.”
“By doing this, they don’t need to synthesize new DNA for every new piece of information to store. Instead they just have to remix their pre-fabricated DNA,” Zhirnov says in an e-mail to Xconomy. (His research interests include DNA data storage, and he says he has no ties to Catalog.)
Park claims that by next year, Catalog’s machine will be able to encode 1 terabyte of information per day in DNA, at a cost of several thousand dollars. Current standard methods of encoding data in DNA would cost billions of dollars and take several weeks to accomplish the same task, Park says.
Catalog’s goals for the performance of its system are ambitious, but “not unreasonable,” Zhirnov says. Whether Catalog’s approach “can be done in an economically viable way—it remains to be seen,” he says. “I find their approach interesting and look forward to seeing the results,” he adds. Catalog hasn’t published any peer-reviewed studies of its methods, Park says.
By comparison,