Subscribe to our Newsletter

Why it’s so hard to use AI to diagnose cancer

In theory, artificial intelligence should be great at helping out. “Our job is pattern recognition,” says Andrew Norgan, a pathologist and medical director of the Mayo Clinic’s digital pathology platform. “We look at the slide and we gather pieces of information that have been proven to be important.” 

Visual analysis is something that AI has gotten quite good at since the first image recognition models began taking off nearly 15 years ago. Even though no model will be perfect, you can imagine a powerful algorithm someday catching something that a human pathologist missed, or at least speeding up the process of getting a diagnosis. We’re starting to see lots of new efforts to build such a model—at least seven attempts in the last year alone—but they all remain experimental. What will it take to make them good enough to be used in the real world?

Details about the latest effort to build such a model, led by the AI health company Aignostics with the Mayo Clinic, were published on arXiv earlier this month. The paper has not been peer-reviewed, but it reveals much about the challenges of bringing such a tool to real clinical settings. 

The model, called Atlas, was trained on 1.2 million tissue samples from 490,000 cases. Its accuracy was tested against six other leading AI pathology models. These models compete on shared tests like classifying breast cancer images or grading tumors, where the model’s predictions are compared with the correct answers given by human pathologists. Atlas beat rival models on six out of nine tests. It earned its highest score for categorizing cancerous colorectal tissue, reaching the same conclusion as human pathologists 97.1% of the time. For another task, though—classifying tumors from prostate cancer biopsies—Atlas beat the other models’ high scores with a score of just 70.5%. Its average across nine benchmarks showed that it got the same answers as human experts 84.6% of the time. 

Let’s think about what this means. The best way to know what’s happening to cancerous cells in tissues is to have a sample examined by a pathologist, so that’s the performance that AI models are measured against. The best models are approaching humans in particular detection tasks but lagging behind in many others. So how good does a model have to be to be clinically useful?

“Ninety percent is probably not good enough. You need to be even better,” says Carlo Bifulco, chief medical officer at Providence Genomics and co-creator of GigaPath, one of the other AI pathology models examined in the Mayo Clinic study. But, Bifulco says, AI models that don’t score perfectly can still be useful in the short term, and could potentially help pathologists speed up their work and make diagnoses more quickly.    

What obstacles are getting in the way of better performance? Problem number one is training data.

“Fewer than 10% of pathology practices in the US are digitized,” Norgan says. That means tissue samples are placed on slides and analyzed under microscopes, and then stored in massive registries without ever being documented digitally. Though European practices tend to be more digitized, and there are efforts underway to create shared data sets of tissue samples for AI models to train on, there’s still not a ton to work with. 

Leave a Reply

Your email address will not be published. Required fields are marked *