Inside Google’s Age of Augmented Humanity, Part 3: Computer Vision Puts a “Bird on Your Shoulder”

“significant jump in performance” simply because the team was now able to train it using tens of millions of images, instead of tens of thousands, and to parallelize the work across thousands of computers.

“Data is the key for pretty much everything we do,” Neven says. “It’s often more critical than the innovation on the algorithmic side. A dumb algorithm with more data beats a smart algorithm with less data.”

In practice, Neven’s team has been throwing both algorithms and data at the general computer vision problem. Goggles isn’t built around a single statistical model, but a variety of them. “A modern computer vision algorithm is a complex building with many stories and little towers on the side,” Neven says. “Whenever I visit a university and I see a piece that I could add, we try to find an arrangement with the researchers to bring third-party recognition software into Goggles as we go. We have the opposite of ‘Not Invented Here’ syndrome. If we find something good, we will add it.”

Goggles is really good at reading text (and translating it, if asked); it can work wonders with a business card or a wine label. If it has a good, close-up image to work with, it’s not bad at identifying random objects—California license plates, for example. And if it can’t figure out what it’s looking at, it can, at the very least, direct you to a collection of images with similar colors and layouts. “We call that internally the Fail Page, but it gives the user something, and over time this will show up less and less,” Neven says.

Google GogglesAs even Neven acknowledges, Goggles isn’t yet a universal visual search tool; that’s why it’s still labeled as a Google Labs project, not an officially supported Google product. Its ability to identify nearly 200,000 works by famous painters, for example, is a computational parlor trick that, in truth, doesn’t add much to its everyday utility. The really hard work—getting good at identifying random objects that don’t have their own Wikipedia entries—is still ahead. “What keeps me awake at night is, ‘What are the honest-to-God use cases that we can deliver,’ where it’s not just an ‘Oh, wow,'” Neven says. “We call it the bar of daily engagement. Can we make it useful enough that every day you will take out Goggles and do something with it?”

But given the huge amount of learning material Google collects from the Web every day, the company’s image recognition algorithms are likely to clear that bar more and more often. They have savant-like skill in some areas: they can tell amur leopards from clouded leopards, based on their spot patterns. They can round up images not just of tulips but of white tulips. The day isn’t all that far away, it seems clear, when Goggles will come close to fulfilling Neven’s image of the bird looking over your shoulder, always ready to tell you what you’re seeing.

The Next Great Stage of Search

What reaching this point might mean on a sociocultural level—in areas like travel and commerce, learning and education, surveillance and privacy—is a question that we’ll probably have to confront sooner than we expected. Why? Because it’s very clear that this is where Google wants to go.

Here’s how Schmidt put it in his speech: “When I walk down the streets of Berlin, I love history, [and] what I want is, I want the computer, my smartphone, to be

Author: Wade Roush

Between 2007 and 2014, I was a staff editor for Xconomy in Boston and San Francisco. Since 2008 I've been writing a weekly opinion/review column called VOX: The Voice of Xperience. (From 2008 to 2013 the column was known as World Wide Wade.) I've been writing about science and technology professionally since 1994. Before joining Xconomy in 2007, I was a staff member at MIT’s Technology Review from 2001 to 2006, serving as senior editor, San Francisco bureau chief, and executive editor of TechnologyReview.com. Before that, I was the Boston bureau reporter for Science, managing editor of supercomputing publications at NASA Ames Research Center, and Web editor at e-book pioneer NuvoMedia. I have a B.A. in the history of science from Harvard College and a PhD in the history and social study of science and technology from MIT. I've published articles in Science, Technology Review, IEEE Spectrum, Encyclopaedia Brittanica, Technology and Culture, Alaska Airlines Magazine, and World Business, and I've been a guest of NPR, CNN, CNBC, NECN, WGBH and the PBS NewsHour. I'm a frequent conference participant and enjoy opportunities to moderate panel discussions and on-stage chats. My personal site: waderoush.com My social media coordinates: Twitter: @wroush Facebook: facebook.com/wade.roush LinkedIn: linkedin.com/in/waderoush Google+ : google.com/+WadeRoush YouTube: youtube.com/wroush1967 Flickr: flickr.com/photos/wroush/ Pinterest: pinterest.com/waderoush/