Screening mammography has been widely implemented to prevent death due to breast cancer. In women attending screening for breast cancer, mortality is reduced by approximately 40%. Screening all women, however, creates a major workload and is threatened by workforce shortages – only in the Netherlands on an annual basis about 1 million women are screened. Since reading screening mammograms is error prone, in many countries double reading policies are implemented, which double the workload, and unfortunately still do not always prevent missing cancers.
Artificial intelligence has generated the possibility to evaluate mammograms in an automated manner without human intervention. However, before such AI systems can be implemented it needs to be sure that an automated evaluation of a screening mammogram is not worse than evaluation by radiologists.
To answer that question, an international expert panel led by Ritse Mann, from the department of medical imaging of the Radboudumc and Linda Moy, from the department of radiology of New York University, performed a meta-analysis of current studies that allowed direct comparison of the stand-alone performance of AI systems to radiologists. The results of this study were published May 23th in Radiology. Within the analysis two types of studies were recognized - reader studies in which a relatively small sample of mammograms was read by both the AI and radiologists, and retrospective cohort studies, where the AI was applied to large consecutive screening sets and compared to historic radiologists’ evaluations. The study showed that, based on evaluations of 1.108.328 mammograms in 497.091 women, current AI programs for mammography outperform humans in reader studies, and - more importantly - they are also on par with radiologists’ performance in screening practice.
The authors conclude that the study results imply that AI is now ready to be tested in practice. To assess the influence of AI on human performance prospective randomized trials may be useful. For actual implementation it is crucial that feedback systems on key outcome measures such as cancer detection and the frequency of false positive findings are in place, ascertaining continuous quality control. When this is guaranteed, current AI has the potential to strongly reduce the radiologists’ workload in screening, whereas the quality of screening programs will remain high and might even improve.
Example of an AI detected breast cancer in the left breast that was missed by radiologists. Note that small cancers may be easily hidden by the normal fibroglandular tissue in the breast (the white structures in both breasts)