Google Gets A Second Brain, Changing Everything About Search

Google Knowledge Graph

take many new types of relevancy signals into account. The improvements were so dramatic that Singhal was later made a Google Fellow and awarded a prize “in the millions of dollars,” according to journalist Steve Levy’s account of Google’s early years, In the Plex.

Still, despite these accomplishments, Singhal says the history of search is basically one big kludge designed to simulate actual human understanding of language.

“The compute power was not there and various other pieces were not there, and the most effective way to search ended up being what today is known as keyword-based search,” Singhal explains. “You give us a query, we find out what is important in that query, and we find out if those important words are also important in a document, using numerous heuristics. This process worked incredibly well—we built the entire field of search on it, including every search company you know of, Google included. But the dream to actually go farther and get closer to human understanding was always there.”

After his initial rewrite of Google’s relevance algorithms, Singhal went on to tackle other problems like morphological analysis: figuring out how to reduce words like “runner” and “running” to their roots (“run,” in this case), in order to perform broader searches, while at the same time learning how to sidestep anomalies (apple and Apple obviously come form the same root, but have very different meanings in the real world). Universal search came next, then autocomplete and Google Instant, which begins to return customized search results even before a user finishes typing a query. (Type “wea,” for example, and you’ll get a local weather forecast.)

“But throughout this process, one thing always bothered us,” Singhal says. “It was that we didn’t ever represent the real world properly in the computer. It was still all a lot of statistical magic, built on top of runs of letters. Even though it almost looked like an incredibly intelligent computer, and we did it far better than anyone, the truth was it was still working on strings of letters.”

This frustration wasn’t just a matter of intellectual aesthetics. Singhal says that by 2009 or 2010, Google had run up against a serious barrier. The goal of the company’s search engineers had always been to connect users with the information they need as efficiently as possible. But for a large group of ambiguous search terms, statistical correlations alone couldn’t help Google intuit the user’s intent. Take Singhal’s favorite example, Taj Mahal. Is the user who types that query searching for the famous mausoleum in Uttar Pradesh (Singhal’s home state), the Grammy-winning blues musician, or the Indian restaurant down the street? Google’s engineers realized that using statistics alone, “we would never be able to say that one of those [interpretations] was more important than the other,” Singhal says.

“I’m very proud of what we achieved using the statistical method, and we still have huge components of our system that are built upon that,” Singhal says. “But we couldn’t take that to the system that we would all want five years from now. Those statistical matching approaches were starting to hit some fundamental limits.”

What Google needed was a way to know more about all of the world’s Taj Mahals, so that it could get better at guessing which one a user wants based on other contextual clues such as their location. And that’s where Metaweb comes into the story. “They were on this quest to represent real-world things, entities, and what is important, what should be known about them,” says Singhal. When Google came across the startup, it had just 12 million entities in its database, which Singhal calls “a toy” compared to real world. “But we saw the promise in the representation technology, and the process they had built to scale that to what we really needed to build a representation of the real world.”

The Database of Everything

Metaweb Technologies has a fascinating history of its own. The company was born as a 2005 spinoff of Applied Minds, the Glendale, CA-based consulting firm and invention factory founded five years before by former Disney R&D head Bran Ferren and former Thinking Machines CEO Danny Hillis. John Giannandrea, a director of engineering at Google who was Metaweb’s chief technology officer, says the idea behind the startup was to build “a machine-readable encyclopedia” to help computers mimic human understanding.

“If you and I are having a conversation, we share a vocabulary,” says Giannandrea, who came to Metaweb after CTO roles at Tellme Networks and Netscape/AOL. “If I say ‘fiscal cliff,’ you know what I mean, and the reason is that you have a dictionary in your head about ideas. Computers don’t have that. That’s what we set about doing.”

The knowledge base Metaweb built is called Freebase, and it’s still in operation today. It’s a collaborative database—technically, a semantic graph—that grows through the contributions of

Author: Wade Roush

Between 2007 and 2014, I was a staff editor for Xconomy in Boston and San Francisco. Since 2008 I've been writing a weekly opinion/review column called VOX: The Voice of Xperience. (From 2008 to 2013 the column was known as World Wide Wade.) I've been writing about science and technology professionally since 1994. Before joining Xconomy in 2007, I was a staff member at MIT’s Technology Review from 2001 to 2006, serving as senior editor, San Francisco bureau chief, and executive editor of TechnologyReview.com. Before that, I was the Boston bureau reporter for Science, managing editor of supercomputing publications at NASA Ames Research Center, and Web editor at e-book pioneer NuvoMedia. I have a B.A. in the history of science from Harvard College and a PhD in the history and social study of science and technology from MIT. I've published articles in Science, Technology Review, IEEE Spectrum, Encyclopaedia Brittanica, Technology and Culture, Alaska Airlines Magazine, and World Business, and I've been a guest of NPR, CNN, CNBC, NECN, WGBH and the PBS NewsHour. I'm a frequent conference participant and enjoy opportunities to moderate panel discussions and on-stage chats. My personal site: waderoush.com My social media coordinates: Twitter: @wroush Facebook: facebook.com/wade.roush LinkedIn: linkedin.com/in/waderoush Google+ : google.com/+WadeRoush YouTube: youtube.com/wroush1967 Flickr: flickr.com/photos/wroush/ Pinterest: pinterest.com/waderoush/