I had an interesting chat yesterday with Ying Shan, an applied researcher at Microsoft’s adCenter Labs in Redmond. Formed in 2006, the 120-strong lab is dedicated to researching and incubating new digital advertising technologies—primarily to compete with Google’s AdSense in the realm of online contextual ads.
Shan, an expert in computer vision and machine learning, joined Microsoft (NASDAQ: [[ticker:MSFT]]) last year from Sarnoff in Princeton, NJ. He just got back from chilly Anchorage, AK, where last week he was one of the organizers of a pioneering workshop on “Internet vision”—the melding of computer vision with all the data and technologies of the Web. “This is a really new field,” he says. (The workshop was part of the 2008 IEEE Conference on Computer Vision and Pattern Recognition.)
The Internet vision workshop was notable not only because it was attended by some 800 people, but also because the dozens of participants and committee members represented the major companies that compete (or might want to compete) in the online-ad space: Alex Berg and Tamara Berg from Yahoo Research, Hartmut Neven from Google, Ed Chang from Google China, Eric Hanning Zhou from Amazon, Yingli Tian from IBM Research, Shai Avidan from Adobe Research, Simon Baker from Microsoft Research, Harry Shum from Microsoft… and the list goes on.
A key topic at the meeting (which caught my ear) was how to use computer-vision technology to recognize objects or people in online videos and automatically generate contextual ads. Right now almost all online ads are based on text—extracting keywords from a page, finding the most relevant ads, and putting them up alongside the page. But nobody’s found a good way to do this for video yet. That’s because “image understanding” has been a tough, unsolved technical problem for more than 30 years. But if they could solve this… “That will be a huge, huge thing,” says Shan.
(Some figures: last year, eMarketer projected that online video advertising in the U.S. would grow from $775 million in 2007 to more than $4 billion in 2011.)
So what kind of progress was made at the workshop? Shan gave me a hypothetical example of how video ads might work. Say there’s software that recognizes that Paris Hilton is in a video clip—maybe it does face recognition, matching her to a database that is hand-labeled by users. A company might then put up ads relevant to the celebrity lifestyle, high-end shopping, and so forth. Whoever figures out how to do this reliably, without making many recognition errors, stands to gain a lot.
Some researchers at the workshop showed concrete results in this direction. Xian-Sheng Hua of Microsoft Research Asia presented one piece of research software that used object-recognition algorithms to detect a car in a video clip, and then invoked ads for cars and other vehicles alongside the page. And in an invited talk, Neven from Google showed progress made in recognizing people and places in online photo collections.
Shan stresses that none of these are upcoming products. And judging by the nature of the corporate competitors there at the meeting, I have to believe that anything presented was either very general or at an early-research stage. Nevertheless, it’s a snapshot of the possible future of online ads. “We focused on fundamental issues and basic problems, not too much about specifics,” says Shan. “We try not to ask each other about products and scenarios.”