TY - JOUR
T1 - How does ChatGPT-4 match radiologists in detecting pulmonary congestion on chest X-ray?
AU - Overgaard Olesen, Anne Sophie
AU - Miger, Kristina Cecilia
AU - Nielsen, Olav Wendelboe
AU - Grand, Johannes
PY - 2024/6/30
Y1 - 2024/6/30
N2 - Hospitalization rates for elderly patients with dyspnea are increasing. Concurrently, the radiologist shortage challenges the initial diagnosis of acute heart failure, as the diagnosis often relies on chest X-ray evaluation. ChatGPT-4 is easily available with image interpreter features, making it a tempting supplementary tool for radiology analysis. We aimed to examine ChatGPT-4’s ability to correctly detect pulmonary congestion on chest X-rays compared to two thoracic radiologists. In a prospective observational single-center study, acute dyspneic patients were examined with chest X-rays within 4 hours of admission. For 50 chest X-rays, two blinded thoracic radiologists evaluated the likelihood of pulmonary congestion on a 5-point Likert scale. Similarly, ChatGPT-4 was prompted to evaluate the chest X-rays for pulmonary congestion, first independently, then with clinical information about medical history, clinical examination, vital parameters, and electrocardiographic (ECG) rhythm. ChatGPT-4 matched the radiologists’ evaluations with a ≤1 point discrepancy in 27 (54%) of the chest X-rays. The match rate slightly improved to 31 (62%) with provided clinical information. ChatGPT-4 accurately identified pulmonary congestion in 12 (48%) of 25 chest X-rays with pulmonary congestion and correctly detected its absence in 15 (60%) of 25 images without pulmonary congestion. In conclusion, the image interpreter features of ChatGPT-4 do not yet support reliable diagnostics of pulmonary congestion on chest X-rays.
AB - Hospitalization rates for elderly patients with dyspnea are increasing. Concurrently, the radiologist shortage challenges the initial diagnosis of acute heart failure, as the diagnosis often relies on chest X-ray evaluation. ChatGPT-4 is easily available with image interpreter features, making it a tempting supplementary tool for radiology analysis. We aimed to examine ChatGPT-4’s ability to correctly detect pulmonary congestion on chest X-rays compared to two thoracic radiologists. In a prospective observational single-center study, acute dyspneic patients were examined with chest X-rays within 4 hours of admission. For 50 chest X-rays, two blinded thoracic radiologists evaluated the likelihood of pulmonary congestion on a 5-point Likert scale. Similarly, ChatGPT-4 was prompted to evaluate the chest X-rays for pulmonary congestion, first independently, then with clinical information about medical history, clinical examination, vital parameters, and electrocardiographic (ECG) rhythm. ChatGPT-4 matched the radiologists’ evaluations with a ≤1 point discrepancy in 27 (54%) of the chest X-rays. The match rate slightly improved to 31 (62%) with provided clinical information. ChatGPT-4 accurately identified pulmonary congestion in 12 (48%) of 25 chest X-rays with pulmonary congestion and correctly detected its absence in 15 (60%) of 25 images without pulmonary congestion. In conclusion, the image interpreter features of ChatGPT-4 do not yet support reliable diagnostics of pulmonary congestion on chest X-rays.
KW - Artificial intelligence (AI)
KW - ChatGPT-4
KW - chest X-ray
KW - large language model
KW - pulmonary congestion
UR - http://www.scopus.com/inward/record.url?scp=85199667814&partnerID=8YFLogxK
U2 - 10.21037/jmai-24-26
DO - 10.21037/jmai-24-26
M3 - Journal article
AN - SCOPUS:85199667814
SN - 2617-2496
VL - 7
JO - Journal of Medical Artificial Intelligence
JF - Journal of Medical Artificial Intelligence
M1 - 18
ER -