Inside Google’s Age of Augmented Humanity: Part 1

voice translation, when we do picture identification, all [the smartphone] does is send a request to the supercomputers that then do all the work.”

And the key thing about those supercomputers—though Schmidt alluded to it only briefly—is that they’re stuffed with data, petabytes of data about what humans say and write and where they go and what they like. This data is drawn from the real world, generated by the same people who use all of Google’s services. And the company’s agility when it comes to collecting, storing, and analyzing it is perhaps its greatest but least appreciated capability.

The power of this data was the one consistent theme in a series of interviews I conducted in late 2010 with Google research directors in the fundamental areas of speech recognition, machine translation, and computer vision. It turns out that many of the problems that have stymied researchers in cognitive science and artificial intelligence for decades—understanding the rules behind grammar, for instance, or building models of perception in the visual cortex—give way before great volumes of data, which can simply be mined for statistical connections.

Unlike the large, structured language corpuses used by the speech-recognition or machine-translation experts of yesteryear, this data doesn’t have to be transcribed or annotated to yield insights. The structure and the patterns arise from the way the data was generated, and the contexts in which Google collects it. It turns out, for example, that meaningful relationships can be extracted from search logs—the more people who search for “IBM stock price” or “Apple Computer stock price,” the clearer it becomes that there is a class of things, i.e. companies, with an attribute called “stock price.” Google’s algorithms glean this from Google’s own users in a process computer scientists call “unsupervised learning.”

“This is a form of artificial intelligence,” Schmidt observed in Berlin. “It’s intelligence where the computer does what it does well and it helps us think better…The computer and the human, together, each does something better because the other is helping.”

In a series of three articles this week, I’ll look more closely at this human-computer symbiosis and how Google is exploiting it, starting with the area of speech recognition. (Subsequent articles will examine machine translation and computer vision.) Research in these areas is advancing so fast that the outlines of Schmidt’s vision of augmented humanity are already becoming clear, especially for owners of Android phones, where Google deploys its new mobile technologies first and most deeply.

Obviously, Google has competition in the market for mobile information services. Over time, its biggest competitor in this area is likely to be Apple, which controls one of the world’s most popular smartphone platforms and recently acquired, in the form of a startup called Siri, a search and personal-assistant technology built on many of the same machine-learning principles espoused by Google’s researchers.

But Google has substantial assets in its favor: a large and talented research staff, one of the world’s largest distributed computing infrastructures, and most importantly, a vast trove of data for unsupervised learning. It seems likely, therefore, that much of the innovation making our phones more powerful over the coming years will emerge from Mountain View.

The Linguists and the Engineers

Today Michael Cohen leads Google’s speech technology efforts. But he actually started out as a composer and guitarist, making a living for seven years writing music for piano, violin, orchestra, and jazz bands. As a musician, he says, he was always interested the mechanics of auditory perception—why certain kinds of sound make musical sense to the human brain, while others are just noise.

A side interest in computer music eventually led him into computer science proper. “That very naturally led me, first of all, to wanting to work on something relating to

Author: Wade Roush

Between 2007 and 2014, I was a staff editor for Xconomy in Boston and San Francisco. Since 2008 I've been writing a weekly opinion/review column called VOX: The Voice of Xperience. (From 2008 to 2013 the column was known as World Wide Wade.) I've been writing about science and technology professionally since 1994. Before joining Xconomy in 2007, I was a staff member at MIT’s Technology Review from 2001 to 2006, serving as senior editor, San Francisco bureau chief, and executive editor of TechnologyReview.com. Before that, I was the Boston bureau reporter for Science, managing editor of supercomputing publications at NASA Ames Research Center, and Web editor at e-book pioneer NuvoMedia. I have a B.A. in the history of science from Harvard College and a PhD in the history and social study of science and technology from MIT. I've published articles in Science, Technology Review, IEEE Spectrum, Encyclopaedia Brittanica, Technology and Culture, Alaska Airlines Magazine, and World Business, and I've been a guest of NPR, CNN, CNBC, NECN, WGBH and the PBS NewsHour. I'm a frequent conference participant and enjoy opportunities to moderate panel discussions and on-stage chats. My personal site: waderoush.com My social media coordinates: Twitter: @wroush Facebook: facebook.com/wade.roush LinkedIn: linkedin.com/in/waderoush Google+ : google.com/+WadeRoush YouTube: youtube.com/wroush1967 Flickr: flickr.com/photos/wroush/ Pinterest: pinterest.com/waderoush/