Abstract
PURPOSE: To evaluate the performance of a custom ChatGPT-based chatbot in triaging ophthalmic emergencies compared to trained ophthalmologists.
METHODS: One hundred hypothetical ophthalmic cases were created based on actual patient data from an ophthalmic emergency department, including details such as age, symptoms and medical history. Three experienced ophthalmologists independently graded these cases using a four-tier severity scale, ranging from Grade 1 (immediate care required) to Grade 4 (non-urgent care). A customized version of ChatGPT was developed to perform the same grading task. Inter-rater agreement was measured between the chatbot and the ophthalmologists, as well as among all human graders.
RESULTS: The chatbot demonstrated substantial agreement with the ophthalmologists, achieving Cohen's kappa scores of 0.737, 0.749 and 0.751, respectively. The highest agreement was between ophthalmologist 3 and the chatbot (κ = 0.751). Fleiss' kappa for overall agreement among all graders was 0.79, indicating substantial agreement. The Kruskal-Wallis test showed no statistically significant differences in the distribution of grades assigned by the chatbot and the ophthalmologists (p = 0.967). Bootstrap analysis revealed no significant difference in kappa values between the chatbot and human graders (p = 0.572, 95% CI -0.163 to 0.072).
CONCLUSIONS: The study demonstrates that a customized chatbot can perform ophthalmic triage with a level of accuracy comparable to that of trained ophthalmologists. This suggests that AI-assisted triage could be a valuable tool in emergency departments, potentially enhancing clinical workflows and reducing waiting times while maintaining high standards of patient care.
| Originalsprog | Engelsk |
|---|---|
| Artikelnummer | 20552076251320298 |
| Tidsskrift | Digital Health |
| Vol/bind | 11 |
| Sider (fra-til) | 20552076251320298 |
| ISSN | 2055-2076 |
| DOI | |
| Status | Udgivet - 2025 |