Automating expert-level medical reasoning evaluation of large language models
自动化大型语言模型的专家级医学推理评估
期刊:npj Digital Medicine
影响因子:15.1
doi:10.1038/s41746-025-02208-7
Zhou, Shuang; Xie, Wenya; Li, Jiaxi; Zhan, Zaifu; Song, Meijia; Yang, Han; Espinoza, Cheyenna; Welton, Lindsay; Mai, Xinnie; Jin, Yanwei; Xu, Zidu; Chung, Yuen-Hei; Xing, Yiyun; Tsai, Meng-Han; Schaffer, Emma; Shi, Yucheng; Liu, Ninghao; Liu, Zirui; Zhang, Rui