Inside Google’s Age of Augmented Humanity: Part 1

fine-tune its own language models. “If the last two words I saw were ‘the dog’ and I have a little ambiguity about the next word, it’s more likely to be ‘ran’ than ‘pan,'” Cohen explains. “The language models tell you the probabilities of all possible next words. We have been able to train enormous language models for Voice Search because we have so much textual data from Google.com.”

Over time, speech recognition capabilities have popped up in more and more Google products. When Google Voice went public in the spring of 2009, it included a voicemail transcription feature courtesy of Cohen’s team. Early in 2010, YouTube began using Google’s transcription engine to publish written transcripts alongside every YouTube video, and YouTube viewers now have the option of seeing the transcribed text on screen, just like closed-captioning on television.

But mobile is still where most of the action is. Google’s Voice Actions service, introduced last August, lets Android users control their phones via voice—for instance, they can initiate calls, send e-mail and text messages, call up music, or search maps on the Web. (This feature is called Voice Commands on some phones.) And the Voice Input feature on certain Android phones adds a microphone button to the virtual keypad, allowing users to speak within any app where text entry is required.

“In general, our vision for [speech recognition on] mobile is complete ubiquity,” says Cohen. “That’s not where we are now, but it is where we are trying to get to. Anytime the user wants to interact by voice, they should be able to.” That even includes interacting with speakers of other languages: Cohen says Google’s speech recognition researchers work closely with their colleagues in machine translation—the subject of the next article in this series—and that the day isn’t far off when the two teams will be able to release a “speech in, speech out” application that combines speech recognition, machine translation, and speech synthesis for near-real-time translation between people speaking different languages.

“The speech effort could be viewed as something that enhances almost all of Google’s services,” says Cohen. “We can organize your voice mails, we can show you the information on the audio track of a YouTube video, you can do searches by voice. A large portion of the world’s information is spoken—that’s the bottom line. It was a big missing piece of the puzzle, and it needs to be included. It’s an enabler of a much wider array of usage scenarios, and I think that what we’ll see over time is all kinds of new applications that people would never have thought of before,” all of them powered by user-provided training data. Which is precisely what Schmidt had in mind in Berlin when he quoted sci-fi author William Gibson: “Google is made of us, a sort of coral reef of human minds and their products.”

Coming in Part 2: A look at the role of big data in Google’s machine translation effort, led by Franz Josef Och.

[Update, 2/28/11: Click here for a convenient single-page version of all three parts of “Inside Google’s Age of Augmented Humanity.”]

Author: Wade Roush

Between 2007 and 2014, I was a staff editor for Xconomy in Boston and San Francisco. Since 2008 I've been writing a weekly opinion/review column called VOX: The Voice of Xperience. (From 2008 to 2013 the column was known as World Wide Wade.) I've been writing about science and technology professionally since 1994. Before joining Xconomy in 2007, I was a staff member at MIT’s Technology Review from 2001 to 2006, serving as senior editor, San Francisco bureau chief, and executive editor of TechnologyReview.com. Before that, I was the Boston bureau reporter for Science, managing editor of supercomputing publications at NASA Ames Research Center, and Web editor at e-book pioneer NuvoMedia. I have a B.A. in the history of science from Harvard College and a PhD in the history and social study of science and technology from MIT. I've published articles in Science, Technology Review, IEEE Spectrum, Encyclopaedia Brittanica, Technology and Culture, Alaska Airlines Magazine, and World Business, and I've been a guest of NPR, CNN, CNBC, NECN, WGBH and the PBS NewsHour. I'm a frequent conference participant and enjoy opportunities to moderate panel discussions and on-stage chats. My personal site: waderoush.com My social media coordinates: Twitter: @wroush Facebook: facebook.com/wade.roush LinkedIn: linkedin.com/in/waderoush Google+ : google.com/+WadeRoush YouTube: youtube.com/wroush1967 Flickr: flickr.com/photos/wroush/ Pinterest: pinterest.com/waderoush/