Waiting for the Speakularity

In late 2010, the Nieman Journalism Lab surveyed reporters for their predictions about what 2011 would bring for the future of journalism. My favorite prediction came from Matt Thompson, an editorial product manager at National Public Radio and a widely respected evangelist for digital journalism. (I happened to meet Thompson in person around the same time at News Foo, a future-of-news conference in Phoenix sponsored by Google and O’Reilly Media. He’s an amazing guy, brimming with about a dozen great ideas per minute.)

Anyway, his prediction was this: Soon—perhaps not in 2011, but in the near future—automatic speech recognition and transcription services would become “fast, free, and decent.” In a jocular reference to the Singularity, Ray Kurzweil’s label for the moment when we’ll be able to upload our minds to computers and live forever, Thompson called the arrival of free and accurate speech transcription “the Speakularity.” He predicted it would be a watershed moment for journalism, since it would mean that all of verbal information reporters collect—everything said in interviews, press conferences, courtrooms, city council meetings, broadcasts, and the like—could easily be turned into text and made searchable.

“Obscure city meetings could be recorded and auto-transcribed,” Thompson wrote. “Interviews could be published nearly instantly as Q&As; journalists covering events could focus their attention on analyzing rather than capturing the proceedings. Because text is much more scannable than audio, recordings automatically indexed to a transcript would be much quicker to search through and edit.”

The implications are obviously immense. But what excited me personally about the Thompson’s concept was the prospect that, as a reporter, I might finally be able to start thinking more about the content of my interviews (analyzing) and less about taking notes (capturing).

Not that I have a problem taking notes. If I had to reveal my superpower, it’s this: I can type extremely fast. I’m talking Clark Kent fast. So fast that I walk away from most interviews with a verbatim transcript. There are always typos in the text, but nothing that can’t be easily deciphered.

It’s a great skill to have, because it means I don’t have to record interviews and waste time transcribing them later. But it comes at a cost. If I’m transcribing during an interview, my brain is divided into three separate operations. First, I’m typing whatever the speaker said a few seconds ago; to use a computational analogy, you might say my finger movements on the keyboard are drawing from the bottom of my short-term memory buffer. Second, my ears are listening to the speaker’s words in the current moment, and adding them to the top of the buffer. Third, I’m trying to comprehend the content and think ahead to the next question I want to ask.

This procedure usually works fine, but it’s exhausting. And if I stop concentrating for even a second, I suffer from buffer overflow, which is just as disastrous for me as it is for a computer program. With automatic and accurate speech transcription, I’d be able to dispense with all the typing and focus fully on the interviewee and their ideas, which would be heavenly.

So, how far off is the Speakularity? The idea itself is not nearly as outlandish as the Singularity (which still has plenty of skeptics, even within the irrationally optimistic population of startup entrepreneurs). Continuous dictation software has been available since the 1990s from companies like

Pages: 12

Author: Wade Roush

Between 2007 and 2014, I was a staff editor for Xconomy in Boston and San Francisco. Since 2008 I've been writing a weekly opinion/review column called VOX: The Voice of Xperience. (From 2008 to 2013 the column was known as World Wide Wade.) I've been writing about science and technology professionally since 1994. Before joining Xconomy in 2007, I was a staff member at MIT’s Technology Review from 2001 to 2006, serving as senior editor, San Francisco bureau chief, and executive editor of TechnologyReview.com. Before that, I was the Boston bureau reporter for Science, managing editor of supercomputing publications at NASA Ames Research Center, and Web editor at e-book pioneer NuvoMedia. I have a B.A. in the history of science from Harvard College and a PhD in the history and social study of science and technology from MIT. I've published articles in Science, Technology Review, IEEE Spectrum, Encyclopaedia Brittanica, Technology and Culture, Alaska Airlines Magazine, and World Business, and I've been a guest of NPR, CNN, CNBC, NECN, WGBH and the PBS NewsHour. I'm a frequent conference participant and enjoy opportunities to moderate panel discussions and on-stage chats. My personal site: waderoush.com My social media coordinates: Twitter: @wroush Facebook: facebook.com/wade.roush LinkedIn: linkedin.com/in/waderoush Google+ : google.com/+WadeRoush YouTube: youtube.com/wroush1967 Flickr: flickr.com/photos/wroush/ Pinterest: pinterest.com/waderoush/ View all posts by Wade Roush