Systematic benchmarking demonstrates large language models have not reached the diagnostic accuracy of traditional rare-disease decision support tools
系统性基准测试表明,大型语言模型尚未达到传统罕见病决策支持工具的诊断准确率。
期刊:European Journal of Human Genetics
影响因子:4.6
doi:10.1038/s41431-026-02054-5
Reese, Justin T; Chimirri, Leonardo; Bridges, Yasemin; Danis, Daniel; Caufield, J Harry; Gargano, Michael A; Kroll, Carlo; Schmeder, Andrew; Liu, Fengchen; Wissink, Kyran; McMurry, Julie A; Graefe, Adam S L; Niyonkuru, Enock; Korn, Daniel R; Casiraghi, Elena; Valentini, Giorgio; Jacobsen, Julius O B; Haendel, Melissa; Smedley, Damian; Mungall, Christopher J; Robinson, Peter N