ENCODE and the Truth

is just in the abstract.

Graur (I’ll italicize papers to prevent confusion with the author) is a snarky paper throughout, causing titters amongst many scientists. One compared it to a vulture picking apart a wildebeest carcass.

A vivid metaphor but one usually applied to movie reviews, not dry research papers.   There are reasons most scientific discussions are impersonal and use the passive voice.

The peculiar nature of the paper – its direct attack on the work of others; its use of first person; its heavy discussion of semantics – is more a personal argument than a logical one.

To me, it is harmful in its hubris, in its personal tone and in its seeming surety that ENCODE is not only wrong science but bad science, done by bad scientists; disparaging both the results and the researchers.

Now, that is just my opinion and has nothing to do with the data. Nature always wins and the truth will come out, without regard for the character of the researchers or how personal their arguments.

So let’s look at the data.

A major area of disagreement between Graur and ENCODE is something Mark Minie focused on in our Xconomy article.  ENCODE resolves some complex issues by “re-defining the gene of a multi-cellular organism as a simple, easily studied biomolecular unit—the RNA transcript of the DNA sequence.”

Graur disagrees with this, feeling that transcription is not sufficient to show function. For example, from the paper (my bold):

“The human genome is rife with dead copies of protein-coding and RNA-specifying genes that have been rendered inactive by mutation. These elements are called pseudogenes (Karro et al. 2007). Pseudogenes come in many flavors (e.g., processed, duplicated, unitary) and, by definition, they are nonfunctional. The measly handful of “pseudogenes” that have so far been assigned a tentative function (e.g., Sassi et al. 2007; Chan et al. 2013) are, by definition, functional genes, merely pseudogene look-alikes. Up to a tenth of all known pseudogenes are transcribed (Pei et al. 2012); some are even translated in tumor cells (e.g., Kandouz et al. 2004). Pseudogene transcription is especially prevalent in pluripotent stem cells, testicular and germline cells, as well as cancer cells such as those used by ENCODE to ascertain transcription (e.g., Babushok et al. 2011). Comparative studies have repeatedly shown that pseudogenes, which have been so defined because they lack coding potential due to the presence of disruptive mutations, evolve very rapidly and are mostly subject to no functional constraint (Pei et al. 2012). Hence, regardless of their transcriptional or translational status, pseudogenes are nonfunctional!”

A pseudogene, even if transcribed or translated, can never be functional. Let’s look at this definition and how it might affect real world decisions – such as requests for funding.

Say a researcher came to you with a proposal examining pseudogenes. Searching genomic databases, using some algorithmic magic, they found at least 15,000 pseudogenes in the human genome. About 1,500 of these “dead copies of protein-coding and RNA-specifying genes” might be transcribed into RNA.

Do any of the pseudogenes have a biological function? There are no data yet. They want to look and need money to continue. Would you fund it?

Following the Graur view, this proposal should be denied, because it looks at something worthless. A pseudogene is a dead copy of a functional gene, even if it is transcribed.  Looking for function is bad science, done by poorly trained technicians.

Now, if that proposal was taken to ENCODE, they would have a different view. Transcription is very important. A transcribed pseudogene could be functional; let’s find out what it really does.

I am glad that Graur devotees were not responsible for funding such proposals, that they were not able to deny funding for the projects, to prevent us from learning more about Nature.

Because guess what? When we look at supposedly nonfunctional pseudogenes, we find function.

The very Chan paper Graur pooh-poohs above as “measly” is the example I just used.

The authors hypothesized that

Author: Richard Gayle

Richard Gayle is the founder and president of SpreadingScience, a company focused on leveraging new online technologies in order to increase the rate at which innovation diffuses through an organization. Previously he spent five years as Vice-President, Research and board member (which he still occupies) for Etubics Corporation, a Seattle biotech developing novel vaccines for a range of human diseases. Richard moved to Seattle in 1986 to join Immunex as a staff scientist, where he worked for 16 years. In addition to his research obligations, which developed technology critical for the company’s research investigations, he was also responsible for the creation and management of the first intranet at Immunex. After leaving Immunex in 2002, he worked on the Business Development committee of the Washington Biotech and Biomedical Association, which coordinated InvestNW as well as organized several events sponsored by the WBBA. He also currently sits on the board of the Sustainable Path Foundation, which informs the Puget Sound community in areas of sustainability and human health by using scientific understanding and systems thinking. Richard received his BS from the California Institute of Technology and his PhD in biochemistry from Rice University.