This story is part of an Xconomy series on artificial intelligence in healthcare. Some of the other stories cover a genomics hackathon, A.I. and radiology, and the impact on doctors and patients.
In the classic 1967 film “The Graduate,” Dustin Hoffman’s just-out-of-college character gets one word of career advice from a family friend: plastics.
In healthcare, there’s a growing rumble of advice about which career not to go into: radiology. In the next five or ten years, some observers say, machines will replace the human experts who scan our medical images for cancer and broken bones. That was the estimate from Geoffrey Hinton, an artificial intelligence pioneer, at a conference last year.
Now, two grassroots A.I. competitions—one just finished, one ongoing—want to find out how close we are to that day. In each case, teams from around the world have designed software that can digest hundreds of thousands of lung and breast scans, with a goal of predicting when an odd spot is in fact a dangerous tumor, a tumor that can be left alone, or something else entirely—scarring, for example, or a flaw in the image—that shouldn’t require more medical procedures.
It’s a more subtle and less explored task for current A.I. systems than, say, distinguishing between a dog and a cat or picking out someone’s face on social media.
It’s also an urgent problem—hence the $1 million or more at stake in each competition, some of the richest awards of their kind. “We’ve run hundreds of competitions, and this is our largest prize,” says Anthony Goldbloom, CEO of Kaggle, a data competition group and Google subsidiary that organizes the annual Data Science Bowl along with the management consulting firm Booz Allen Hamilton.
The Laura and John Arnold Foundation put up the money for both competitions: $1 million for this year’s Data Science Bowl, focused on lung cancer prediction; and $1.2 million for the Digital Mammography DREAM Challenge, for breast cancer screening.
Each contest came about because the human radiologist rate of false positives—detection of something that turns out not to be cancer—seems unacceptably high (more than 90 percent for lung scans, and about 50 percent for mammography, according to the National Cancer Institute).
False positives often lead to more procedures, higher costs, patient anxiety, and not uncommonly, big health risks. You really don’t want a lung biopsy if you don’t need one, but they often follow after a scan finds a nodule that’s neither small enough to leave alone nor large enough to raise a red flag.
Eric Stern, a Seattle-based radiologist who was an unpaid advisor to the Data Science Bowl lung challenge, says advanced software should be an important tool for radiologists, not a threat to them. “Humans can’t easily go into public health data and correlate trends” such as smoking history, other health problems, diet, and exercise to better understand “which lesions are more likely to be cancer,” says Stern. “That to me is where ‘big data’ has the most potential benefit.”
Stern, who practices at the University of Washington, helped shape the format of the lung challenge—to a certain extent. He wanted the contest designers to include smoking history and other so-called metadata about the patients, but in the end the contestants could only train their A.I. systems on the lung images themselves. Stern says radiologists have had “40 years of looking” at images: “We in the radiologist community weren’t optimistic” that the contest would lead to new diagnostic powers without building in metadata, he adds.
The Data Science Bowl challenge, which was held this spring, had another shortcoming: It only included one image per patient, even though a radiologist looks at a patient’s images over time. “In reality that’s how imaging diagnosis is done,” says Keyvan Farahani of the National Cancer Institute, which provided the lung images for the contest. When asked if machine learning isn’t ready to deal with data sets that contain multiple images of the same person over time, Farahani says this: “I don’t want to say it’s not ready. I want to say it hasn’t been explored.”
The top 10 finishers split most of the $1 million prize money. But it’s hard to know how well they did if you don’t speak A.I. The scores (winning number: 0.39975) reflect how well the algorithms predicted a cancer diagnosis—but only relative to one another. They still need to be translated into a figure that shows prediction power relative to radiologists themselves. The NCI is working on that, says Farahani. But the results aren’t likely to scare radiologists into job retraining anytime soon.
“This is the first step in a long chain before making real cancer diagnoses,” says Goldbloom. “We’re in the early days of this technology; it’s unclear how accurate these algorithms will get.”
Arguments about the viability of the radiologist’s job aside, there is little argument about the need to reduce medical error, which is the third leading cause of death in the U.S., according to a recent study.
There’s already a kind of software to help radiologists, called computer-aided diagnosis. Stern, who is on the informatics commission of the American College of Radiology, says CAD programs (not to be confused with computer-aided design) have a spotty track record. “We told the Data Science Bowl organizers we don’t need another CAD program.” The goal with newer A.I. software, it seems, is to go far beyond that.
Meanwhile, the mammography competition continues. The nonprofit Sage Bionetworks in Seattle just finished the first phase of the competition,