Performance comparison of large language models in boron neutron capture therapy knowledge assessment

硼中子俘获疗法知识评估中大型语言模型的性能比较

阅读:1

Abstract

Accelerator-based boron neutron capture therapy (BNCT) is a binary radiation therapy that has rapidly developed in recent years. This study systematically evaluated and compared the performance of four mainstream model families [ChatGPT, Bard (Gemini), Claude, and ERNIE Bot] in answering BNCT-related knowledge questions, providing a reference for exploring their potential in BNCT professional education. Forty-seven bilingual BNCT questions covering key concepts, clinical practice, and reasoning tasks were constructed. Four mainstream model families [ ChatGPT, Claude, Bard(Gemini), and ERNIE Bot] were tested across five rounds in two languages and question formats. The accuracy, reasoning ability, uncertainty expression, and version effects were analyzed. ChatGPT (72.8%) and Claude (70.4%) showed significantly higher overall accuracy rates than Bard(Gemini) (62.0%) and ERNIE Bot (55.6%) (p < 0.001). Both high-performance models performed significantly better on reasoning-based questions than on fact-based questions (p < 0.001). The average performance improvement from version updates (7.51 ± 8.46percentage points) was numerically higher than the changes during same-version maintenance (0.61 ± 8.68 percentage points, p = 0.126). Although language and questioning methods showed statistically significant effects, the effect sizes were minimal (η2p < 0.01). Uncertainty acknowledgment rates varied significantly among the model families (4.7%-23.7%, p = 0.003). ChatGPT can provide relatively accurate knowledge for the popularization of BNCT. However, existing general-purpose LLMs still cannot accurately answer all BNCT questions and show significant differences in uncertainty expression. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1038/s41598-026-36322-7.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。