“significant jump in performance” simply because the team was now able to train it using tens of millions of images, instead of tens of thousands, and to parallelize the work across thousands of computers.
“Data is the key for pretty much everything we do,” Neven says. “It’s often more critical than the innovation on the algorithmic side. A dumb algorithm with more data beats a smart algorithm with less data.”
In practice, Neven’s team has been throwing both algorithms and data at the general computer vision problem. Goggles isn’t built around a single statistical model, but a variety of them. “A modern computer vision algorithm is a complex building with many stories and little towers on the side,” Neven says. “Whenever I visit a university and I see a piece that I could add, we try to find an arrangement with the researchers to bring third-party recognition software into Goggles as we go. We have the opposite of ‘Not Invented Here’ syndrome. If we find something good, we will add it.”
Goggles is really good at reading text (and translating it, if asked); it can work wonders with a business card or a wine label. If it has a good, close-up image to work with, it’s not bad at identifying random objects—California license plates, for example. And if it can’t figure out what it’s looking at, it can, at the very least, direct you to a collection of images with similar colors and layouts. “We call that internally the Fail Page, but it gives the user something, and over time this will show up less and less,” Neven says.
As even Neven acknowledges, Goggles isn’t yet a universal visual search tool; that’s why it’s still labeled as a Google Labs project, not an officially supported Google product. Its ability to identify nearly 200,000 works by famous painters, for example, is a computational parlor trick that, in truth, doesn’t add much to its everyday utility. The really hard work—getting good at identifying random objects that don’t have their own Wikipedia entries—is still ahead. “What keeps me awake at night is, ‘What are the honest-to-God use cases that we can deliver,’ where it’s not just an ‘Oh, wow,'” Neven says. “We call it the bar of daily engagement. Can we make it useful enough that every day you will take out Goggles and do something with it?”
But given the huge amount of learning material Google collects from the Web every day, the company’s image recognition algorithms are likely to clear that bar more and more often. They have savant-like skill in some areas: they can tell amur leopards from clouded leopards, based on their spot patterns. They can round up images not just of tulips but of white tulips. The day isn’t all that far away, it seems clear, when Goggles will come close to fulfilling Neven’s image of the bird looking over your shoulder, always ready to tell you what you’re seeing.
The Next Great Stage of Search
What reaching this point might mean on a sociocultural level—in areas like travel and commerce, learning and education, surveillance and privacy—is a question that we’ll probably have to confront sooner than we expected. Why? Because it’s very clear that this is where Google wants to go.
Here’s how Schmidt put it in his speech: “When I walk down the streets of Berlin, I love history, [and] what I want is, I want the computer, my smartphone, to be