Psychometric properties and detectability of GPT-4o-generated multiple-choice questions compared with human-authored items across imaging specialties

GPT-4o 生成的多项选择题与人工编写的题目在影像学各专业领域的心理测量学特性和可检测性比较

阅读：2

作者：Linde,Philipp,Fichter,Florian,Dietlein,Markus,Sudbrock,Ferdinand,Afshar,Kambiz,Dapper,Hendrik,Fokas,Emmanouil,Hillebrecht,Anna-Lena,Raupach,Tobias,Laupichler,Matthias Carl

期刊：	npj Digital Medicine	影响因子：	15.100
时间：	2026	起止号：	2026 Jan 8;9(1):132
doi：	10.1038/s41746-025-02313-7

Abstract

Large language models (LLMs) have the potential to scale assessment in medical education, but their psychometric equivalence to expert-written items and the detectability of their origin remain uncertain. In a preregistered, single-center, blinded observational, within-subject comparison, we evaluated 24 GPT-4o-generated versus 24 human-authored topic-matched multiple-choice questions (MCQs) across radiation oncology, radiology, and nuclear medicine. Medical students (n = 82) and physicians (n = 46) completed an identical 48-item formative mock examination, with item origin masked. Item difficulty (human: mean 0.65 [SD 0.22] vs LLM: 0.67 [0.20]) and discrimination (0.27 [0.12] vs 0.29 [0.12]) did not differ significantly; participants did not identify item origin above chance (0.50). Expert ratings of appropriateness and didactic quality showed low interrater agreement (ICC = 0.07-0.18). In this expert-reviewed, human-in-the-loop workflow, the item difficulty and discriminatory power of MCQs generated with GPT-4o did not differ significantly from those of expert-authored items, and were not reliably recognized as AI-generated by examinees. These findings delineate a feasible pathway for responsibly scaling formative assessment content in imaging-focused medical education, while underscoring the need for explicit educational policies regarding oversight, transparency, and fairness.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。

肿瘤免疫

炎症

T细胞

线粒体

凋亡

转录调控

巨噬细胞

自噬

传染病

氧化应激

肠道菌群

磷酸化

血管生成

囊泡

3D/类器官

单细胞

中性粒细胞

外泌体

DNA甲基化

miRNA

药物研究

铁死亡

细胞衰老

乙酰化

缺氧低氧

泛素化

树突状细胞

组蛋白修饰

炎性小体

肿瘤微环境

lncRNA

代谢重编程

焦亡

m6A/m5C/m7G

内质网应激

空间多组学

细胞基因治疗

治疗耐药

相分离

Treg

上皮间质转化

免疫代谢

染色质重塑

脂质过氧化

脂代谢

蛋白质稳态

铁代谢

细胞极性

氨基酸代谢

碱基编辑

cGAS-STING

肠脑轴

蛋白降解

乳酸化

翻译调控

circRNA

piRNA

肿瘤异质性

NK 细胞

氧化脂质

MDSC

NETosis

低氧缺氧

溶酶体功能

细胞干性

琥珀酰化

CAR-NK

RNA 编辑

冷应激

Tfh

巴豆酰化

器官芯片

表观遗传记忆

铜死亡

器官纤维化

线粒体未折叠蛋白反应

空间代谢组

程序性坏死

自噬流

肠肝轴

丙酰化

MAIT 细胞