OBJECTIVES: ChatGPT is an artificial intelligence model that can interpret free-text prompts and return detailed, human-like responses across a wide domain of subjects. This study evaluated the extent of the threat posed by ChatGPT to the validity of short-answer assessment problems used to examine pre-clerkship medical students in our undergraduate medical education program. METHODS: Forty problems used in prior student assessments were retrieved and stratified by levels of Bloom's Taxonomy. Thirty of these problems were submitted to ChatGPT-3.5. For the remaining 10 problems, we retrieved past minimally passing student responses. Six tutors graded each of the 40 responses. Comparison of performance between student-generated and ChatGPT-generated answers aggregated as a whole and grouped by Bloom's levels of cognitive reasoning, was done using t-tests, ANOVA, Cronbach's alpha, and Cohen's d. Scores for ChatGPT-generated responses were also compared to historical class average performance. RESULTS: ChatGPT-generated responses received a mean score of 3.29 out of 5 (nâ=â30, 95% CI 2.93-3.65) compared to 2.38 for a group of students meeting minimum passing marks (nâ=â10, 95% CI 1.94-2.82), representing higher performance (Pâ=â.008, η(2)â=â0.169), but was outperformed by historical class average scores on the same 30 problems (mean 3.67, Pâ=â.018) when including all past responses regardless of student performance level. There was no statistically significant trend in performance across domains of Bloom's Taxonomy. CONCLUSION: While ChatGPT was able to pass short answer assessment problems spanning the pre-clerkship curriculum, it outperformed only underperforming students. We remark that tutors in several cases were convinced that ChatGPT-produced responses were produced by students. Risks to assessment validity include uncertainty in identifying struggling students and inability to intervene in a timely manner. The performance of ChatGPT on problems requiring increasing demands of cognitive reasoning warrants further research.
Examining the Threat of ChatGPT to the Validity of Short Answer Assessments in an Undergraduate Medical Program.
阅读:4
作者:Morjaria Leo, Burns Levi, Bracken Keyna, Ngo Quang N, Lee Mark, Levinson Anthony J, Smith John, Thompson Penelope, Sibbald Matthew
| 期刊: | Journal of Medical Education and Curricular Development | 影响因子: | 1.600 |
| 时间: | 2023 | 起止号: | 2023 Sep 28; 10:23821205231204178 |
| doi: | 10.1177/23821205231204178 | ||
特别声明
1、本文转载旨在传播信息,不代表本网站观点,亦不对其内容的真实性承担责任。
2、其他媒体、网站或个人若从本网站转载使用,必须保留本网站注明的“来源”,并自行承担包括版权在内的相关法律责任。
3、如作者不希望本文被转载,或需洽谈转载稿费等事宜,请及时与本网站联系。
4、此外,如需投稿,也可通过邮箱info@biocloudy.com与我们取得联系。
