Evaluating Retrieval Augmented Generation-enhanced Large Language Models for Question Answering On German Neurovascular Guidelines

评估检索增强生成增强的大型语言模型在德国神经血管指南问答中的应用

阅读:1

Abstract

PURPOSE: To investigate the feasibility of Retrieval-augmented Generation (RAG)-enhanced Large Language Models (LLMs) in answering questions about two German neurovascular guidelines. METHODS: Four LLMs (GPT-4o-mini, Llama 3.1 405B Instruct Turbo, Mixtral 8 × 22B Instruct, and Claude 3.5 Sonnet) with RAG as well as GPT-4o-mini without RAG were evaluated for generating answers about two German neurovascular guidelines ("S3 Guideline for Diagnosis, Treatment, and Follow-up of Extracranial Carotid Stenosis" and "S2e Guideline for Acute Therapy of Ischemic Stroke"). The answers were classified as "correct", "inaccurate", or "incorrect" by two neurovascular experts in consensus. Additionally, retrieval performance of five retrieval strategies was analyzed on a synthetic dataset of 384 questions. RESULTS: Claude Sonnet 3.5 achieved the highest answer correctness (70.6% correct, 10.6% wrong), followed by Llama 3.1 (64.7%, 15.3% wrong), GPT-4o-mini with RAG (57.6%, 15.3% wrong), and Mixtral (56.6%, 17.6% wrong). GPT-4o-mini without RAG performed significantly worse (20.0%, 32.9% wrong). Retrieval errors were the primary cause of incorrect answers (80%). For retrieval, BM25 achieved the highest accuracy (82.0%), outperforming vector-based methods like "BAAI/bge-m3" (78.4%). CONCLUSION: RAG significantly improves LLM accuracy for medical guideline question answering compared to the inherent knowledge of pretrained LLMs alone while still showing significant error rates. Improved accuracy and confidence metrics are needed for safer implementation in clinical routine. Additionally, our results demonstrate the strong performance of general LLMs in medical question answering for non-English languages, such as German, even without specific training.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。