TY - JOUR
T1 - Gathering Validity Evidence for a Simulation-Based Test of Otoscopy Skills
AU - von Buchwald, Josefine Hastrup
AU - Frendø, Martin
AU - Frithioff, Andreas
AU - Britze, Anders
AU - Frederiksen, Thomas Winther
AU - Melchiors, Jacob
AU - Andersen, Steven Arild Wuyts
PY - 2025/2
Y1 - 2025/2
N2 - OBJECTIVE: Otoscopy is a key clinical examination used by multiple healthcare providers but training and testing of otoscopy skills remain largely uninvestigated. Simulator-based assessment of otoscopy skills exists, but evidence on its validity is scarce. In this study, we explored automated assessment and performance metrics of an otoscopy simulator through collection of validity evidence according to Messick's framework.METHODS: Novices and experienced otoscopists completed a test program on the Earsi otoscopy simulator. Automated assessment of diagnostic ability and performance were compared with manual ratings of technical skills. Reliability of assessment was evaluated using Generalizability theory. Linear mixed models and correlation analysis were used to compare automated and manual assessments. Finally, we used the contrasting groups method to define a pass/fail level for the automated score.RESULTS: A total of 12 novices and 12 experienced otoscopists completed the study. We found an overall G-coefficient of .69 for automated assessment. The experienced otoscopists achieved a significantly higher mean automated score than the novices (59.9% (95% CI [57.3%-62.6%]) vs. 44.6% (95% CI [41.9%-47.2%]), P < .001). For the manual assessment of technical skills, there was no significant difference, nor did the automated score correlate with the manually rated score (Pearson's r = .20, P = .601). We established a pass/fail standard for the simulator's automated score of 49.3%.CONCLUSION: We explored validity evidence supporting an otoscopy simulator's automated score, demonstrating that this score mainly reflects cognitive skills. Manual assessment therefore still seems necessary at this point and external video-recording is necessary for valid assessment. To improve the reliability, the test course should include more cases to achieve a higher G-coefficient and a higher pass/fail standard should be used.
AB - OBJECTIVE: Otoscopy is a key clinical examination used by multiple healthcare providers but training and testing of otoscopy skills remain largely uninvestigated. Simulator-based assessment of otoscopy skills exists, but evidence on its validity is scarce. In this study, we explored automated assessment and performance metrics of an otoscopy simulator through collection of validity evidence according to Messick's framework.METHODS: Novices and experienced otoscopists completed a test program on the Earsi otoscopy simulator. Automated assessment of diagnostic ability and performance were compared with manual ratings of technical skills. Reliability of assessment was evaluated using Generalizability theory. Linear mixed models and correlation analysis were used to compare automated and manual assessments. Finally, we used the contrasting groups method to define a pass/fail level for the automated score.RESULTS: A total of 12 novices and 12 experienced otoscopists completed the study. We found an overall G-coefficient of .69 for automated assessment. The experienced otoscopists achieved a significantly higher mean automated score than the novices (59.9% (95% CI [57.3%-62.6%]) vs. 44.6% (95% CI [41.9%-47.2%]), P < .001). For the manual assessment of technical skills, there was no significant difference, nor did the automated score correlate with the manually rated score (Pearson's r = .20, P = .601). We established a pass/fail standard for the simulator's automated score of 49.3%.CONCLUSION: We explored validity evidence supporting an otoscopy simulator's automated score, demonstrating that this score mainly reflects cognitive skills. Manual assessment therefore still seems necessary at this point and external video-recording is necessary for valid assessment. To improve the reliability, the test course should include more cases to achieve a higher G-coefficient and a higher pass/fail standard should be used.
KW - Adult
KW - Clinical Competence
KW - Educational Measurement/methods
KW - Female
KW - Humans
KW - Male
KW - Otolaryngology/education
KW - Otoscopy/methods
KW - Reproducibility of Results
KW - Simulation Training
KW - technical skills training
KW - handheld otoscopy
KW - otology
KW - simulation-based training
KW - evidence-based medical education
UR - http://www.scopus.com/inward/record.url?scp=85206878191&partnerID=8YFLogxK
U2 - 10.1177/00034894241288434
DO - 10.1177/00034894241288434
M3 - Journal article
C2 - 39417404
SN - 0003-4894
VL - 134
SP - 70
EP - 78
JO - Annals of Otology, Rhinology and Laryngology
JF - Annals of Otology, Rhinology and Laryngology
IS - 2
ER -