Diffbot Turns Online Comments Into Market Intelligence Databases

Let’s say you’re a tech enterpreneur whose groundbreaking gizmo got rave reviews in influential publications, spurring months of lucrative sales. But later, sales slide downward, and you don’t know why.

Turns out, a rumor is circulating in discussion forums that your product has a high failure rate after six months of use—a mistaken impression or a deliberate lie. How would you detect those inaccurate comments as soon as they appear, so you can publicly refute them?

It’s the kind of challenge that Diffbot, a small Palo Alto, CA-based artificial intelligence company, set out to help companies meet. Diffbot announced today that it has created a new search tool, Discussions API, that digs for product mentions in comment threads, community forums and online reviews—the hidden crannies of the “deep Web” it says Google doesn’t fully explore.

“Engaged commenters could be misrepresenting a brand,” Diffbot product executive John Davi says. “Now that’s easy to find.”

While traditional search engines routinely pull up lists of published articles about products, Diffbot indexes the online conversation that follows below those stories. The volume of this user-generated commentary could be as much as 400 times larger than the “surface Web” of mainstream media, organizational websites, and other content easily accessible with conventional search engines, Diffbot says.

While companies often search Twitter feeds as part of their media monitoring routines, Davi says some of the most influential consumer backchat may take place in other forums where writers aren’t limited to 140-character messages. And those other comment sections don’t have their own built-in search functions, as Twitter does. Diffbot is now making those forums searchable.

While a Google keyword search might turn up a few individual comments found on the Web, Diffbot scans the discussion sections across multiple websites and returns the results to the customer in a database format that summarizes many comments, Davi says. Through the use of artificial intelligence and robot technology, Diffbot extracts key details from each comment, including the author, author url, the site where the comment appeared, and the nature of the opinion expressed.

The database can identify positive trends in public opinion as well as damaging misconceptions, Davi says. Product makers gain the ability to reach out to consumers whose favorable comments could be used in company marketing campaigns, he says.

Discussion API searches opinion forums including Facebook Comments, Disqus, WordPress, Blogger, and Reddit.

“We actually expect this to be a pretty hot commodity,” Davi says. The idea to search comment sections arose both from Diffbot’s staff and from its existing customers, he says.

Diffbot’s new Discussions tool builds on other innovations the company has made to expand the searchable territory on the Web. Founded in 2009, Diffbot first attacked a blind spot in the way a robot brain “reads” a website. Unlike a human being, a traditional search engine can’t distinguish well between the different sections of a webpage, such as a story headline, the author’s byline, an image, and the body of the story.

Using machine learning and human trainers, Diffbot taught its robot brain to recognize these layout components in a variety of different webpage types, such as a front page or home page, an article, an e-commerce site displaying product information, and pages containing images or videos.

Customers such as Instapaper use this Diffbot layout-reading function to reformat Web content for use on mobile devices. Diffbot automatically reshuffles the positions of headlines, text blocks, and other components to fit the different dimensions of a smartphone or tablet screen.

But Diffbot also uses the layout-reading function to find data such as product prices—its robot brain now knows which page areas to search for that information. Customers like Pinterest might use the price information Diffbot extracts from multiple locations to help its users with comparison shopping. Product wholesalers use Diffbot’s aggregate reports of current price data to make sure stores aren’t violating agreements to charge customers the suggested manufacturer’s retail price, Davi says.

“We’ve built a fairly tidy business on that,” Davi says of the page-reading capabilities that were Diffbot’s first commercial services. He declined to disclose company revenue numbers. Customers of the 12-employee company include Adobe, CBS Interactive, Cisco, eBay, Salesforce, and Samsung.

Diffbot has raised a total of $2 million since it was founded, and doesn’t need to raise capital for a while, Davi says.

“We’ve had break-even months, but then we keep hiring people,” he says.

Author: Bernadette Tansey

Bernadette Tansey is a former editor of Xconomy San Francisco. She has covered information technology, biotechnology, business, law, environment, and government as a Bay area journalist. She has written about edtech, mobile apps, social media startups, and life sciences companies for Xconomy, and tracked the adoption of Web tools by small businesses for CNBC. She was a biotechnology reporter for the business section of the San Francisco Chronicle, where she also wrote about software developers and early commercial companies in nanotechnology and synthetic biology.