Edico Genome Aims at Data Bottleneck in Genome Sequencing

With the arrival of next-generation gene sequencing machines like the Illumina (NASDAQ: [[ticker:ILMN]]) HiSeq X Ten, medicine has been moving to develop new ways of using genomic data to treat patients. Last month, for example, J. Craig Venter unveiled plans to sequence the entire genome of every patient entering the UC San Diego Moores Cancer Center as an initial goal for his latest startup, Human Longevity Inc.

At the same time, though, it’s becoming clear that generating genomic data for thousands of cancer patients involves working with very large numbers—and that means a wave of new opportunities for innovation are emerging as genomics and Big Data come together. One startup moving to catch this wave is Edico Genome, a San Diego startup founded last year to fix a bottleneck in the way the data being generated by the HiSeq X Ten and other next-generation sequencing machines is processed.

Edico has developed a specialized computer processor for ordering the readout of nucleotides—A, C, T, or G—from short segments of DNA generated by next-generation sequencing technology so they align with a reference genome. It’s a process that genomics specialists refer to as “mapping.”

It is a Big Data problem. The human genome consists of roughly 3.2 billion nucleotide base pairs (made of that four-letter alphabet of DNA) that encode between 20,000 and 25,000 genes. Next-generation sequencing technology cuts the DNA molecule into millions of short segments to “read” the sequence and digitize the results. What comes out is a very large data file that can range from 150 gigabytes to more than 320 gigabytes. An average-size, 200-gigabyte data file would be roughly equivalent to 800 big city phone books—from the days when people used their phone books.

But the data file still consists of millions of segments of DNA that must be mapped to a reference genome. Think of throwing 800 telephone books into a paper shredder, and then trying to reassemble the millions of strips to make sense of the information.

Today, companies like Illumina use clusters of computer servers to map these random DNA segments with a reference genome—a process that typically takes about 20 hours, depending on

Pages: 123

Author: Bruce V. Bigelow

In Memoriam: Our dear friend Bruce V. Bigelow passed away on June 29, 2018. He was the editor of Xconomy San Diego from 2008 to 2018. Read more about his life and work here. Bruce Bigelow joined Xconomy from the business desk of the San Diego Union-Tribune. He was a member of the team of reporters who were awarded the 2006 Pulitzer Prize in National Reporting for uncovering bribes paid to San Diego Republican Rep. Randy “Duke” Cunningham in exchange for special legislation earmarks. He also shared a 2006 award for enterprise reporting from the Society of Business Editors and Writers for “In Harm’s Way,” an article about the extraordinary casualty rate among employees working in Iraq for San Diego’s Titan Corp. He has written extensively about the 2002 corporate accounting scandal at software goliath Peregrine Systems. He also was a Gerald Loeb Award finalist and National Headline Award winner for “The Toymaker,” a 14-part chronicle of a San Diego start-up company. He takes special satisfaction, though, that the series was included in the library for nonfiction narrative journalism at the Nieman Foundation for Journalism at Harvard University. Bigelow graduated from U.C. Berkeley in 1977 with a degree in English Literature and from the Columbia University Graduate School of Journalism in 1979. Before joining the Union-Tribune in 1990, he worked for the Associated Press in Los Angeles and The Kansas City Times. View all posts by Bruce V. Bigelow