Diffbot Challenges Google Supremacy With Rival Knowledge Graph

assemble information about competitors and suppliers, or people they might want to hire.

“All these entities leave footprints on the Web,” Tung says. Diffbot is now developing frameworks for machine analysis of new kinds of pages, such as events, locations, and profile pages, to further expand its Global Index.

If you’re a high school freshman with an English term paper due tomorrow, you might be wondering how you can log in to this new type of search to get all the facts you need about Charles Dickens, by midnight tonight, painlessly assembled by a machine.

For the most part, though, consumers can’t yet directly tap into the structured Web data compiled by Diffbot and Google. Diffbot shares its data resources with consumers indirectly by selling its services to search engines and app developers. A Bing search about a product, for example, will show the traditional list of website links, but in the upper right-hand corner, it may display an image, price, and other facts assembled about the product—structured data.

Google is also making some of its Knowledge Graph findings available through mobile searches. But Tung speculates that Google may limit these kinds of search returns in favor of traditional Web page listings, because they expose consumers to more advertising, the tech titan’s source of revenue.

In his 2014 article, Roush pondered what would happen if Diffbot remained an independent company, grew to 10,000 employees, and vied with Google to control our online existence. But Roush, wistfully, found it more likely that Diffbot would be “acqui-hired” by Google at some point.

Tung brought up this prediction when I talked to him this week—mainly to refute it. He didn’t sound like a guy who was ready to let somebody else discover his company’s full potential.

“We have received a lot of acquisition offers from pretty much all of the large technology companies,” Tung says. Diffbot’s response to the offers, he says, is to convert its suitors into customers. He declines to say whether Google belongs to either category.

So what future is Tung aiming an independent Diffbot towards?

My sense is that Tung is an artificial intelligence researcher at heart, captivated by questions about what machines could do if humans knew how to equip their silicon brains and train them expertly.

Here’s part of his shorter-term vision: Technologists are not just organizing Web data to better inform humans so they can decide what to do or to think. They’re also structuring the Web to better inform machines, so they can take action themselves and work with other machines.

The first products along these lines may be as cozy as recipe apps. You might be able to point your mobile phone at an unfurnished corner of your new living room, so that in a few minutes it will tell you what chair on the market would fit in the space, and look good with the rest of your decor, Tung says.

He gave another example: The printer in your office runs out of ink, knows whether it needs a black or color cartridge, knows which manufacturers’ products are compatible, taps into the Web, compares prices, and executes the order.

“We think that’s the exciting future,” Tung says. “Everything’s intelligent, and they all need access to information.”

But beyond those limited chores, there’s still a bg question out there: Will machines ever duplicate human intelligence? For example, could they exercise judgment by balancing the benefits and consequences of two different courses of action, such as choices between medical treatments or business strategies?

“Progress toward human intelligence is still quite a rocky road ahead,” Tung says. The artificial intelligence community hasn’t yet hit on the missing link that would make that possible, he says. “It’s going to require a breakthrough that’s still unknown.”

But the Diffbot team is trying to make training computers more sophisticated by rendering the Web—the digital repository of a growing swath of human knowledge—readable by machines.

The big successes already gained in artificial intelligence have not necessarily come from new programming wizardry, but through the application of old algorithms to dense data sets, Tung and fellow Diffbot executive John Davi point out.

Achievements in computer image classification have built on the burgeoning population of digitized images that were not available in earlier decades, they say. The IBM computer Watson performed feats of medical diagnosis by drawing from a concentrated trove of human-curated knowledge in that relatively narrow realm of data, Tung and Davi add.

“Our working theory is, if we can assemble enough structured, labeled data, we can simulate all aspects of human intelligence,” Tung says. The inflection point in artificial intelligence may be assembling trillions of objects from the Web that machines can read, he says.

“We’re working on what could be the missing piece, which is the data,” Tung says.

Author: Bernadette Tansey

Bernadette Tansey is a former editor of Xconomy San Francisco. She has covered information technology, biotechnology, business, law, environment, and government as a Bay area journalist. She has written about edtech, mobile apps, social media startups, and life sciences companies for Xconomy, and tracked the adoption of Web tools by small businesses for CNBC. She was a biotechnology reporter for the business section of the San Francisco Chronicle, where she also wrote about software developers and early commercial companies in nanotechnology and synthetic biology.