Experts' prediction of item difficulty of multiple-choice questions in the Ethiopian Undergraduate Medicine Licensure Examination

专家对埃塞俄比亚本科医学执照考试中多项选择题难度的预测

阅读:1

Abstract

BACKGROUND: The ability of an expert's item difficulty ratings to predict test-taker actual performance is an important aspect of licensure examinations. Expert judgment is used as a primary source of information for users to make prior decisions to determine the pass rate of test takers. The nature of raters involved in predicting item difficulty is central to set credible standards. Therefore, this study aimed to assess and compare raters' prediction and actual Multiple-Choice Questions' difficulty of the undergraduate medicine licensure examination (UGMLE) in Ethiopia. METHOD: 815 examinees' responses to 200 Multiple-Choice Questions (MCQs) were used in this study. The study also included experts' item difficulty ratings of seven physicians who participated in the standard settings of UGMLE. Then, analysis was conducted to understand experts' rating variation in predicting the actual difficulty levels of examinees. Descriptive statistics was used to profile the mean rater's and actual difficulty value for MCQs, and ANOVA was used to compare the mean differences between raters' prediction of item difficulty. Additionally, regression analysis was used to understand the interrater variations in item difficulty predictions compared to the actual difficulty. The proportion of variance of actual difficulty explained from rater prediction was computed using regression analysis. RESULTS: In this study, the mean difference between raters' prediction and examinees' actual performance was inconsistent across the exam domains. The study revealed a statistically significant strong positive correlation between the actual and predicted item difficulty in exam domains eight and eleven. However, a non-statistically significant very weak positive correlation was reported in exam domains seven and twelve. The multiple comparison analysis showed significant differences in mean item difficulty ratings between raters. In the regression analysis, experts' item difficulty ratings of the UGMLE had 33% power in predicting the actual difficulty level. The regression model also showed a moderate positive correlation (R = 0.57) that was statistically significant at F (6, 193) = 15.58, P = 0.001. CONCLUSION: This study demonstrated the complex process for assessing the difficulty level of MCQs in the UGMLE and emphasized the benefits of using experts' ratings in advance. To ensure the exams maintain the necessary reliable and valid scores, raters' accuracy on the UGMLE must be improved. To achieve this, techniques that align with the evolving assessment methodologies must be developed.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。