3 billion chemical units of DNA, these bacterial species had just 4.1 million to 4.5 million chemical units. But the sheer mass of data wasn’t really the problem here. There are significant structural variations in the different strains; such variations are hard to detect on today’s machines, which generate full genomes by looking at relatively narrow stretches of the whole genome, Schadt says. The PacBio machine, he says, had an advantage in that it looks at much longer stretches of DNA, called reads, which enabled it to piece together the full bacterial genomes quickly, and to easily identify the subtle variations that make one strain distinct from another.
Getting the precise sequence is thought to be important for responding to infectious disease quickly, because it can provide valuable data for public health officials. In this case, some scientists hypothesized that the Haitian cholera epidemic might be coming from nearby Caribbean waters, or possibly from Latin America. By sequencing all five of those genomes for comparison, the scientists were able to say with confidence that the strain actually came from Southeast Asia, Schadt says. The scientists were also able to get a deeper understanding of how much damage the new strain is likely to inflict on people, and how likely it is to continue to spread. Based on how pathogenic the bug appears, one of the paper’s senior authors, John J. Mekalanos, is now advancing a new strategy to develop a vaccine against cholera that could be given across Latin America, Schadt says.
“Because of this strain’s increased fitness, and pathogenicity, the fear is it will dominate across Latin America,” Schadt says.
This group certainly wasn’t the only one in the world working feverishly the past couple months on the cholera problem. The CDC dumped some raw genome sequence data it obtained from using Illumina machines, and put it into a public repository known as GenBank. The Harvard Medical School/PacBio team, besides writing up its findings in today’s New England Journal, has deposited its raw sequence data in the National Center for Biotechnology Information’s (NCBI) public database, Schadt says.
As Schadt says, this kind of experiment should become increasingly common in laboratories whenever new epidemics pop up. It’s the sort of thing that really would have been impossible even two years ago, when it cost way too much and took too long to get this kind of information.
“Given the speed and granularity in kinds of runs we can do, this kind of project now becomes possible,” Schadt says.