TY - JOUR
T1 - Commercially Available Chest Radiograph AI Tools for Detecting Airspace Disease, Pneumothorax, and Pleural Effusion
AU - Lind Plesner, Louis
AU - Müller, Felix C
AU - Brejnebøl, Mathias W
AU - Laustrup, Lene C
AU - Rasmussen, Finn
AU - Nielsen, Olav W
AU - Boesen, Mikael
AU - Brun Andersen, Michael
PY - 2023/9
Y1 - 2023/9
N2 - Background Commercially available artificial intelligence (AI) tools can assist radiologists in interpreting chest radiographs, but their real-life diagnostic accuracy remains unclear. Purpose To evaluate the diagnostic accuracy of four commercially available AI tools for detection of airspace disease, pneumothorax, and pleural effusion on chest radiographs. Materials and Methods This retrospective study included consecutive adult patients who underwent chest radiography at one of four Danish hospitals in January 2020. Two thoracic radiologists (or three, in cases of disagreement) who had access to all previous and future imaging labeled chest radiographs independently for the reference standard. Area under the receiver operating characteristic curve, sensitivity, and specificity were calculated. Sensitivity and specificity were additionally stratified according to the severity of findings, number of findings on chest radiographs, and radiographic projection. The χ2 and McNemar tests were used for comparisons. Results The data set comprised 2040 patients (median age, 72 years [IQR, 58-81 years]; 1033 female), of whom 669 (32.8%) had target findings. The AI tools demonstrated areas under the receiver operating characteristic curve ranging 0.83-0.88 for airspace disease, 0.89-0.97 for pneumothorax, and 0.94-0.97 for pleural effusion. Sensitivities ranged 72%-91% for airspace disease, 63%-90% for pneumothorax, and 62%-95% for pleural effusion. Negative predictive values ranged 92%-100% for all target findings. In airspace disease, pneumothorax, and pleural effusion, specificity was high for chest radiographs with normal or single findings (range, 85%-96%, 99%-100%, and 95%-100%, respectively) and markedly lower for chest radiographs with four or more findings (range, 27%-69%, 96%-99%, 65%-92%, respectively) (P < .001). AI sensitivity was lower for vague airspace disease (range, 33%-61%) and small pneumothorax or pleural effusion (range, 9%-94%) compared with larger findings (range, 81%-100%; P value range, > .99 to < .001). Conclusion Current-generation AI tools showed moderate to high sensitivity for detecting airspace disease, pneumothorax, and pleural effusion on chest radiographs. However, they produced more false-positive findings than radiology reports, and their performance decreased for smaller-sized target findings and when multiple findings were present. © RSNA, 2023 Supplemental material is available for this article. See also the editorial by Yanagawa and Tomiyama in this issue.
AB - Background Commercially available artificial intelligence (AI) tools can assist radiologists in interpreting chest radiographs, but their real-life diagnostic accuracy remains unclear. Purpose To evaluate the diagnostic accuracy of four commercially available AI tools for detection of airspace disease, pneumothorax, and pleural effusion on chest radiographs. Materials and Methods This retrospective study included consecutive adult patients who underwent chest radiography at one of four Danish hospitals in January 2020. Two thoracic radiologists (or three, in cases of disagreement) who had access to all previous and future imaging labeled chest radiographs independently for the reference standard. Area under the receiver operating characteristic curve, sensitivity, and specificity were calculated. Sensitivity and specificity were additionally stratified according to the severity of findings, number of findings on chest radiographs, and radiographic projection. The χ2 and McNemar tests were used for comparisons. Results The data set comprised 2040 patients (median age, 72 years [IQR, 58-81 years]; 1033 female), of whom 669 (32.8%) had target findings. The AI tools demonstrated areas under the receiver operating characteristic curve ranging 0.83-0.88 for airspace disease, 0.89-0.97 for pneumothorax, and 0.94-0.97 for pleural effusion. Sensitivities ranged 72%-91% for airspace disease, 63%-90% for pneumothorax, and 62%-95% for pleural effusion. Negative predictive values ranged 92%-100% for all target findings. In airspace disease, pneumothorax, and pleural effusion, specificity was high for chest radiographs with normal or single findings (range, 85%-96%, 99%-100%, and 95%-100%, respectively) and markedly lower for chest radiographs with four or more findings (range, 27%-69%, 96%-99%, 65%-92%, respectively) (P < .001). AI sensitivity was lower for vague airspace disease (range, 33%-61%) and small pneumothorax or pleural effusion (range, 9%-94%) compared with larger findings (range, 81%-100%; P value range, > .99 to < .001). Conclusion Current-generation AI tools showed moderate to high sensitivity for detecting airspace disease, pneumothorax, and pleural effusion on chest radiographs. However, they produced more false-positive findings than radiology reports, and their performance decreased for smaller-sized target findings and when multiple findings were present. © RSNA, 2023 Supplemental material is available for this article. See also the editorial by Yanagawa and Tomiyama in this issue.
KW - Adult
KW - Humans
KW - Female
KW - Aged
KW - Artificial Intelligence
KW - Pneumothorax/diagnostic imaging
KW - Retrospective Studies
KW - Deep Learning
KW - Radiography, Thoracic/methods
KW - Sensitivity and Specificity
KW - Pleural Effusion/diagnostic imaging
UR - http://www.scopus.com/inward/record.url?scp=85174232403&partnerID=8YFLogxK
U2 - 10.1148/radiol.231236
DO - 10.1148/radiol.231236
M3 - Journal article
C2 - 37750768
SN - 0033-8419
VL - 308
SP - e231236
JO - Radiology
JF - Radiology
IS - 3
M1 - e231236
ER -