Rivaling Google, Web-Mining Diffbot Opens Its Knowledge Graph to All

Diffbot, a tech startup that continuously scours the Web to assemble a “knowledge graph” of billions of facts in context, announced today that it’s opening up the searchable resource to the public—with starter rates as low as a cable TV bill.

Mountain View, CA-based Diffbot gleans unstructured data scattered across websites, ads, blog posts, videos, and other public online assets to create a knowledge repository that can be mined by companies for their specific purposes. Users can direct the Diffbot Knowledge Graph to compile a company profile, surface good job candidates, track the reputations of products, or keep their contact lists up to date, for example. The company says it deploys machine learning software, computer vision, and natural language processing to generate and revise the data trove automatically, without the need for human curation.

Back in mid-2015, when Diffbot’s global Web index had amassed some 19 billion facts characterizing 600 million entities (such as people or products), the company declared it had surpassed the size of Google’s Knowledge Graph. Diffbot says its database now holds 1 trillion facts and covers 10 billion entities, and is almost 500 times as large as the Google Knowledge Graph.

The Diffbot AI engine, which operated in beta mode until now, has attracted a select group of about 450 customers including Cisco, Salesforce, Crunchbase, eBay, and Adobe. CEO Mike Tung says his company’s knowledge graph also augments search results for Microsoft’s Bing and another search engine, Duck Duck Go.

The company, founded in 2008, is now ready to grow its customer base by making the “knowledge-as-a-service” tool more user-friendly to non-developers, and by opening the door to any interested business.

“We really want to democratize this,” CEO and founder Mike Tung says. Diffbot already serves startups as well as big tech companies, and the public launch will make its knowledge graph available to a wider class of business professionals who use data in their jobs, he says.

One example: Tung says Diffbot is being used by Factmata, one of the startups that drew financial support from Google for using technology to fact-check news stories and combat false reports. The Diffbot algorithms not only scoop up online facts, but also evaluate their probability of truth, and check for logical consistency, he says. If someone declared “Mike Tung works on Venus,” the system would check his address and calculate his daily commute in millions of miles before rejecting the claim as questionable, Tung says.

While a phalanx of Russian trolls might flood public commentaries on social media with the same false message, the Diffbot engine doesn’t accept quantity as the measure of a statement’s truth, Tung says. Diffbot’s algorithms can trace the messages to their source, check the sources’ record of distributing truthful statements, and discover the interconnections among the sources, he says. Tung declined to say whether Google, Facebook, and other companies trying to root out fake news are among Diffbot’s customers.

When it was a tiny startup working out of a Palo Alto bungalow, Diffbot drew the support of angel investors including Sun Microsystems founder Andy Bechtolsheim and other tech founders and company executives. It scored the first investment made by Stanford University’s venture capital fund, StartX. After an angel round was topped off at $3 million with an investment from Bloomberg Beta in mid-2015, Diffbot raised $10 million (seven months later) in a Series A fundraising round led by China’s leading Internet service provider Tencent and Felicis Ventures.

Diffbot had already become profitable by then, and Tung says the company is still operating past the break-even point based on revenue from customers. He didn’t disclose current revenue or how far in the black Diffbot is, but says the company has more money in the bank now than when it raised its Series A. While Tung doesn’t rule out seeking more investor money at some point, he says Google is his role model for creating a business that never needed to raise money after its Series A.

One of Tung’s key goals was to prove that an AI business could be self-sustaining. For many companies, he says, it’s hard to justify using AI rather than hiring people to tackle tasks with the necessary intelligence. “We’ve reached that kind of human-level [intelligence], which is a real inflection point,” he says.

In Tung’s vision of the future, human beings won’t spend time sifting through mountains of data, trying to determine what’s true. “AI is so much better at doing that,” he says. Instead, people will spend time analyzing data and using it for work higher up on the value chain.

Diffbot left its bungalow in Palo Alto last year for a house-like building in Mountain View, and now has two separate data centers in Fremont, CA, with thousands of CPU cores—almost all of its machines hand-assembled at Diffbot’s office.

Diffbot has kept its operation lean, but it’s now planning to scale up its staff from 28 employees to 75 over the next 18 months, to add more researchers and engineers, Tung says. The company will also hire sales staff for the first time, rather than continuing to rely on inbound contacts.

The company’s outreach will target “business platforms that use data as a source for their end users,” Tung says. Data firms that serve the financial sector, such as Crunchbase, are already customers. The knowledge graph could also provide the data foundation for financial analysts who need to keep track of fund managers, and for marketers who want to find all vendors selling a certain brand of sneakers, say, to make sure they’re not violating retail price agreements or making false claims, he says.

Now that the knowledge graph is open to the public, a very small company could take advantage of a low-cost way in, at a monthly charge of $129. Such users would need to master Diffbot’s query language—a precise way of asking questions, Tung says. Subscription packages at higher rates come with more staff support from Diffbot, which will negotiate

Author: Bernadette Tansey

Bernadette Tansey is a former editor of Xconomy San Francisco. She has covered information technology, biotechnology, business, law, environment, and government as a Bay area journalist. She has written about edtech, mobile apps, social media startups, and life sciences companies for Xconomy, and tracked the adoption of Web tools by small businesses for CNBC. She was a biotechnology reporter for the business section of the San Francisco Chronicle, where she also wrote about software developers and early commercial companies in nanotechnology and synthetic biology.