Interobserver Agreement and Performance of Concurrent AI Assistance for Radiographic Evaluation of Knee Osteoarthritis

Mathias W Brejnebøl*, Anders Lenskjold, Katharina Ziegeler, Huib Ruitenbeek, Felix C Müller, Janus U Nybing, Jacob J Visser, Loes M Schiphouwer, Jorrit Jasper, Behschad Bashian, Haoyin Cao, Maximilian Muellner, Sebastian A Dahlmann, Dimitar I Radev, Ann Ganestam, Camilla T Nielsen, Carsten U Stroemmen, Edwin H G Oei, Kay-Geert A Hermann, Mikael Boesen

*Corresponding author for this work
1 Citation (Scopus)

Abstract

Background Due to conflicting findings in the literature, there are concerns about a lack of objectivity in grading knee osteoarthritis (KOA) on radiographs. Purpose To examine how artificial intelligence (AI) assistance affects the performance and interobserver agreement of radiologists and orthopedists of various experience levels when evaluating KOA on radiographs according to the established Kellgren-Lawrence (KL) grading system. Materials and Methods In this retrospective observer performance study, consecutive standing knee radiographs from patients with suspected KOA were collected from three participating European centers between April 2019 and May 2022. Each center recruited four readers across radiology and orthopedic surgery at in-training and board-certified experience levels. KL grading (KL-0 = no KOA, KL-4 = severe KOA) on the frontal view was assessed by readers with and without assistance from a commercial AI tool. The majority vote of three musculoskeletal radiology consultants established the reference standard. The ordinal receiver operating characteristic method was used to estimate grading performance. Light kappa was used to estimate interrater agreement, and bootstrapped t statistics were used to compare groups. Results Seventy-five studies were included from each center, totaling 225 studies (mean patient age, 55 years ± 15 [SD]; 113 female patients). The KL grades were KL-0, 24.0% (n = 54); KL-1, 28.0% (n = 63); KL-2, 21.8% (n = 49); KL-3, 18.7% (n = 42); and KL-4, 7.6% (n = 17). Eleven readers completed their readings. Three of the six junior readers showed higher KL grading performance with versus without AI assistance (area under the receiver operating characteristic curve, 0.81 ± 0.017 [SEM] vs 0.88 ± 0.011 [P < .001]; 0.76 ± 0.018 vs 0.86 ± 0.013 [P < .001]; and 0.89 ± 0.011 vs 0.91 ± 0.009 [P = .008]). Interobserver agreement for KL grading among all readers was higher with versus without AI assistance (κ = 0.77 ± 0.018 [SEM] vs 0.85 ± 0.013; P < .001). Board-certified radiologists achieved almost perfect agreement for KL grading when assisted by AI (κ = 0.90 ± 0.01), which was higher than that achieved by the reference readers independently (κ = 0.84 ± 0.017; P = .01). Conclusion AI assistance increased junior readers' radiographic KOA grading performance and increased interobserver agreement for osteoarthritis grading across all readers and experience levels. Published under a CC BY 4.0 license. Supplemental material is available for this article.

Original languageEnglish
Article numbere233341
JournalRadiology
Volume312
Issue number1
Pages (from-to)e233341
ISSN0033-8419
DOIs
Publication statusPublished - Jul 2024

Keywords

  • Humans
  • Female
  • Observer Variation
  • Male
  • Osteoarthritis, Knee/diagnostic imaging
  • Middle Aged
  • Retrospective Studies
  • Artificial Intelligence
  • Radiography/methods
  • Aged

Fingerprint

Dive into the research topics of 'Interobserver Agreement and Performance of Concurrent AI Assistance for Radiographic Evaluation of Knee Osteoarthritis'. Together they form a unique fingerprint.

Cite this