That’s because health data such as medical imaging, vital signs, and data from wearable devices can vary for reasons unrelated to a particular health condition, such as lifestyle or background noise. The machine learning algorithms popularized by the tech industry are so good at finding patterns that they can discover shortcuts to “correct” answers that won’t work out in the real world. Smaller data sets make it easier for algorithms to cheat that way and create blind spots that cause poor results in the clinic. “The community fools [itself] into thinking we’re developing models that work much better than they actually do,” Berisha says. “It furthers the AI hype.”
Berisha says that problem has led to a striking and concerning pattern in some areas of AI health care research. In studies using algorithms to detect signs of Alzheimer’s or cognitive impairment in recordings of speech, Berisha and his colleagues found that larger studies reported worse accuracy than smaller ones—the opposite of what big data is supposed to deliver. A review of studies attempting to identify brain disorders from medical scans and another for studies trying to detect autism with machine learning reported a similar pattern.
The dangers of algorithms that work well in preliminary studies but behave differently on real patient data are not hypothetical. A 2019 study found that a system used on millions of patients to prioritize access to extra care for people with complex health problems put white patients ahead of Black patients.
Avoiding biased systems like that requires large, balanced data sets and careful testing, but skewed data sets are the norm in health AI research, due to historical and ongoing health inequalities. A 2020 study by Stanford researchers found that 71 percent of data used in studies that applied deep learning to US medical data came from California, Massachusetts, or New York, with little or no representation from the other 47 states. Low-income countries are represented barely at all in AI health care studies. A review published last year of more than 150 studies using machine learning to predict diagnoses or courses of disease concluded that most “show poor methodological quality and are at high risk of bias.”
Two researchers concerned about these shortcomings recently launched a nonprofit called Nightingale Open Science to try and improve the quality and scale of data sets available to researchers. It works with health systems to curate collections of medical images and associated data from patient records, anonymize them, and make them available for nonprofit research.
Ziad Obermeyer, a Nightingale cofounder and associate professor at the University of California, Berkeley, hopes providing access to that data will encourage competition that leads to better results, similar to how large, open collections of images helped spur advances in machine learning. “The core of the problem is that a researcher can do and say whatever they want in health data because no one can ever check their results,” he says. “The data [is] locked up.”
Nightingale joins other projects attempting to improve health care AI by boosting data access and quality. The Lacuna Fund supports the creation of machine learning data sets representing low- and middle-income countries and is working on health care; a new project at University Hospitals Birmingham in the UK with support from the National Health Service and MIT is developing standards to assess whether AI systems are anchored in unbiased data.
Mateen, editor of the UK report on pandemic algorithms, is a fan of AI-specific projects like those but says the prospects for AI in health care also depend on health systems modernizing their often creaky IT infrastructure. “You’ve got to invest there at the root of the problem to see benefits,” Mateen says.
More Great WIRED Stories