Diffbot Raises $10M To Expand AI Engine That Mines The Web

Diffbot, an artificial intelligence company that helps clients extract and combine data from multiple Web sources, announced today it raised $10 million from investors including Tencent and Felicis Ventures to expand its “knowledge-as-a-service” offerings to businesses and consumer apps.

The Palo Alto, CA-based startup, founded in 2009, still has a tiny staff of 14. But Diffbot’s ambition is huge: to catalog trillions of facts across the Web—many of them drawn from page elements such as comment forums, which can’t be mined by traditional search engines. The startup says it has made a significant start on that goal, having indexed 1.2 billion entities such as people, products, and places since the middle of last year. Its Global Index also encompasses 10 to 20 times that number of facts, says Diffbot founder and CEO Mike Tung. Last June, the company said its database had surpassed the size of Google’s Knowledge Graph.

Some big customers are putting Diffbot’s technology to use. Cisco and Adobe are sifting through comment sections to monitor customer feedback about their products, Tung says. Other companies use Diffbot to comb online professional profiles and assemble lists of hiring prospects. Apps powered by Diffbot can address specific consumer needs, such as a comparison of prices for a coveted mountain bike. Businesses also use Diffbot’s services to track their competitors and develop sales leads. Search engines such as Microsoft’s Bing use it to enhance the results from user queries.

Cybersecurity may also emerge as one of Diffbot’s client areas.

Diffbot works in any language, so it can parse message forums in Arabic, for example, Tung says. “It can tell you who the speakers are, and what they’re saying,” he says. Tung declined to name any customers Diffbot may have attracted from the information security sector. But he says the company’s technology is “sufficiently powerful to reduce information asymmetry.”

With more than 250 customers—including Amazon, CBS Interactive, eBay, Instapaper, Microsoft, Salesforce, and Samsung—Diffbot became profitable at the end of 2015, Tung says.

“We’ve proven it’s possible to build a profitable AI business model,” Tung says.

Diffbot isn’t disclosing its revenues at this point, but Tung says they cover operating expenses. These costs are low because Diffbot’s automated data collection and analysis technology requires no human curation, he says.

The startup’s key early innovation was to extend the search function into previously uncharted territory by teaching computers how to recognize the various sub-sections of Web pages, including headlines, ad boxes, pictures, and discussion threads. Diffbot could then classify each page by type, such as news articles and product pages. That knowledge allows the computers to find and assemble related information, such as product prices across various retailers, and consumer opinions across many social media platforms and comment sections. The technology creates “structured data” that machines can read and interpret, as the company describes it.

Diffbot has been scaling up its data center, adding to its bank of proprietary servers with specialized hardware, and integrating Web-based processing power into the system to meet surges of demand. The company’s new money will accelerate the scale-up and fund an expansion of its R&D team, Tung says.

“We wanted to see the future happen quicker,” Tung says. “Money contracts time.”

The second goal for Diffbot’s $10 million Series A financing round was to make alliances with investors experienced in artificial intelligence, Tung says. The round was led by Tencent, China’s leading Internet service provider, and Felicis Ventures. Other participants include Andy Bechtolsheim, the co-founder of Sun Microsystems; Amplify Ventures; Valor Capital; Bill Lee, an early investor in SpaceX and Tesla; and artificial intelligence expert Georges Harik, an early Google staffer.

Tencent is one of the big Asian companies investing in U.S. tech companies, both for financial and strategic reasons. Tencent is not a customer of Diffbot’s, Tung says. He adds the word “now.”

In 2015, Tencent invested in a number of Bay Area companies, including mobile game company PocketGems and app finder Vurb—both based in San Francisco—and Redwood City-based virtual reality startup AltspaceVR.

Tung declined to disclose the company’s valuation for the Series A round, which brings Diffbot’s total fundraising to $13 million.

The research groups at Google and Facebook are Diffbot’s closest rivals in the development of methods to gather and synthesize Web data using artificial intelligence technology, Tung says. But rather than keeping the knowledge in-house, Diffbot is making it available to outside companies, Tung says.

“We’re sort of like Switzerland in the AI wars,” Tung says.

Other startups are pursuing a similar AI-as-a-service model, recognizing that while the Internet giants have the resources to push the envelope in things like computer vision and natural language understanding, lots of companies can benefit from these technologies. One example is Seattle-based KITT.ai, a spinout from the Allen Institute for Artificial Intelligence that is working on natural language understanding.

Web-mining can be a competitive advantage for apps as well as the proliferating devices of the Internet of Things, Tung says.

“Everything’s becoming intelligent, but the limiting factor of intelligence is access to structured data,” Tung says.

Author: Bernadette Tansey

Bernadette Tansey is a former editor of Xconomy San Francisco. She has covered information technology, biotechnology, business, law, environment, and government as a Bay area journalist. She has written about edtech, mobile apps, social media startups, and life sciences companies for Xconomy, and tracked the adoption of Web tools by small businesses for CNBC. She was a biotechnology reporter for the business section of the San Francisco Chronicle, where she also wrote about software developers and early commercial companies in nanotechnology and synthetic biology.