Comparing Machine Learning Models and Human Raters When Ranking Medical Student Performance Evaluations

比较机器学习模型和人工评分者在医学生表现评价中的应用

阅读:1

Abstract

BACKGROUND: The Medical Student Performance Evaluation (MSPE), a narrative summary of each student's academic and professional performance in US medical school is long, making it challenging for residency programs evaluating large numbers of applicants. OBJECTIVE: To create a rubric to assess MSPE narratives and to compare the ability of 3 commercially available machine learning models (MLMs) to rank MSPEs in order of positivity. METHODS: Thirty out of a possible 120 MSPEs from the University of Central Florida class of 2020 were de-identified and subjected to manual scoring and ranking by a pair of faculty members using a new rubric based on the Accreditation Council for Graduate Medical Education competencies, and to global sentiment analysis by the MLMs. Correlation analysis was used to assess reliability and agreement between student rank orders produced by faculty and MLMs. RESULTS: The intraclass correlation coefficient used to assess faculty interrater reliability was 0.864 (P<.001; 95% CI 0.715-0.935) for total rubric scores and ranged from 0.402 to 0.768 for isolated subscales; faculty rank orders were also highly correlated (r(s)=0.758; P<.001; 95% CI 0.539-0.881). The authors report good feasibility as the rubric was easy to use and added minimal time to reading MSPEs. The MLMs correctly reported a positive sentiment for all 30 MSPE narratives, but their rank orders produced no significant correlations between different MLMs, or when compared with faculty rankings. CONCLUSIONS: The rubric for manual grading provided reliable overall scoring and ranking of MSPEs. The MLMs accurately detected positive sentiment in the MSPEs but were unable to provide reliable rank ordering.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。