Cardiology knowledge assessment of retrieval-augmented open versus proprietary large language models

基于检索增强的开放式大型语言模型与专有大型语言模型的心脏病学知识评估

阅读:2

Abstract

To evaluate the performance of open-weight and proprietary LLMs, with and without Retrieval-Augmented Generation (RAG), on cardiology board-style questions and benchmark them against the human average. We tested 14 LLMs (6 open-weight, 8 proprietary) on 449 multiple-choice questions from the American College of Cardiology Self-Assessment Program (ACCSAP). Accuracy was measured as percent correct. RAG was implemented using a knowledge base of 123 guideline and textbook documents. The open-weight model DeepSeek R1 achieved the highest accuracy at 86.9% (95% CI: 83.4-89.7%), outperforming proprietary models and the human average of 78%. GPT 4o (80.9%, 95% CI: 77.0-84.2%) and the commercial platform OpenEvidence (81.3%, 95% CI: 77.4-84.7%) demonstrated similar performance. A positive correlation between model size and performance was observed within model families, but across families, substantial variability persisted among models with similar parameter counts. After RAG, all models improved, and open-weight models like Mistral Large 2 (78.0%, 95% CI: 73.9-81.5) performed comparably to proprietary alternatives like GPT 4o. Large language models (LLMs) are increasingly integrated into clinical workflows, yet their performance in cardiovascular medicine remains insufficiently evaluated. Open-weight models can match or exceed proprietary systems in cardiovascular knowledge, with RAG particularly beneficial for smaller models. Given their transparency, configurability, and potential for local deployment, open-weight models, strategically augmented, represent viable, lower-cost alternatives for clinical applications. Open-weight LLMs demonstrate competency in cardiovascular medicine comparable to or exceeding that of proprietary models, with and without RAG depending on the model.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。