Waiting for the Speakularity

Dragon Systems, which is now part of Nuance Communications. The problem is that it has never quite met all three of Thompson’s criteria. If it was fast and decent, it wasn’t free, and if it was close to free, it wasn’t decent.

Lately, though, that’s been changing. Today’s mobile devices have both powerful internal processors and broadband connections to external, cloud-based speech transcription engines. Nuance introduced its “Dragon Dictation” app for Apple iOS devices in 2009, giving users the ability to dictate short stretches of text—about a paragraph. Smartphones with Google’s Android operating system have had a built-in Voice Actions feature since 2010. In 2011, Apple came out with the iPhone 4S, which had dictation capabilities, not to mention the speech-driven Siri virtual personal assistant, baked in. And this year, Apple put dictation into both the third-generation iPad and the Mountain Lion update of its Mac OS X operating system.

One of the big constraints in all of these systems, right now, is on the length of the passage that can be transcribed. The Google, Nuance, and Apple technology works great for dictating reminders, text messages, short e-mails, and the like, but it can’t handle continuous speech. I’m guessing that’s because all of the heavy lifting (identifying speech sounds and probabilistically assigning text to them) is happening in the cloud, and there’s a limit on the size of the sound files that can be uploaded and deciphered in one go.

Another, bigger hurdle is that today’s commercial speech recognition technology still has a very hard time dealing with multiple voices, especially if they’re talking over one another (as humans routinely do). The Holy Grail would be a service that provided continuous, speaker-independent transcription of conversations between two or more people. The finished transcripts would be fodder not just for search engines but for a new wealth of newspaper, magazine, and blog stories.

Thompson predicted that Google will be the first to bring together all the elements of the vision, and I think that’s a good bet, given the company’s enormous computational resources, its experience with services like Google 411 and automatic YouTube captioning, and the depth of its bench in areas like natural language processing and machine translation. But you can’t count out Nuance or Apple (which uses Nuance’s technology in Siri and the iOS dictation feature), and research institutions such as SRI International, which are also thinking hard about this stuff.

I’m ready for the Speakularity now—but realistically, I’ll probably have to keep taking manual notes for the next few years. Just cut me a break if I’m interviewing you, my buffer flows over, and I have to ask you to rewind.

Author: Wade Roush

Between 2007 and 2014, I was a staff editor for Xconomy in Boston and San Francisco. Since 2008 I've been writing a weekly opinion/review column called VOX: The Voice of Xperience. (From 2008 to 2013 the column was known as World Wide Wade.) I've been writing about science and technology professionally since 1994. Before joining Xconomy in 2007, I was a staff member at MIT’s Technology Review from 2001 to 2006, serving as senior editor, San Francisco bureau chief, and executive editor of TechnologyReview.com. Before that, I was the Boston bureau reporter for Science, managing editor of supercomputing publications at NASA Ames Research Center, and Web editor at e-book pioneer NuvoMedia. I have a B.A. in the history of science from Harvard College and a PhD in the history and social study of science and technology from MIT. I've published articles in Science, Technology Review, IEEE Spectrum, Encyclopaedia Brittanica, Technology and Culture, Alaska Airlines Magazine, and World Business, and I've been a guest of NPR, CNN, CNBC, NECN, WGBH and the PBS NewsHour. I'm a frequent conference participant and enjoy opportunities to moderate panel discussions and on-stage chats. My personal site: waderoush.com My social media coordinates: Twitter: @wroush Facebook: facebook.com/wade.roush LinkedIn: linkedin.com/in/waderoush Google+ : google.com/+WadeRoush YouTube: youtube.com/wroush1967 Flickr: flickr.com/photos/wroush/ Pinterest: pinterest.com/waderoush/