Evaluation of large language models as a diagnostic tool for medical learners and clinicians using advanced prompting techniques

利用高级提示技术评估大型语言模型作为医学学习者和临床医生诊断工具的效果

阅读:1

Abstract

BACKGROUND: Large language models (LLMs) have demonstrated capabilities in natural language processing and critical reasoning. Studies investigating their potential use as healthcare diagnostic tools have largely relied on proprietary models like ChatGPT and have not explored the application of advanced prompt engineering techniques. This study aims to evaluate the diagnostic accuracy of three open-source LLMs and the role of prompt engineering using clinical scenarios. METHODS: We analyzed the performance of three open-source LLMs-llama-3.1-70b-versatile, llama-3.1-8b-instant, and mixtral-8x7b-32768-using advanced prompt engineering when answering Medscape Clinical Challenge questions. Responses were recorded and evaluated for correctness, accuracy, precision, specificity, and sensitivity. A sensitivity analysis was conducted presenting the three LLMs with basic prompting challenge questions and excluding cases with visual assets. Results were compared with previously published performance data on GPT-3.5. RESULTS: Llama-3.1-70b-versatile, llama-3.1-8b-instant, and mixtral-8x7b-32768 achieved correct responses in 79%, 65%, and 62% of cases, respectively, outperforming GPT-3.5 (74%). Diagnostic accuracy, precision, sensitivity, and specificity responses all outperformed those previously reported for GPT-3.5. Results generated using advanced prompting strategies were superior to those based on basic prompting. Sensitivity analysis revealed similar trends when cases with visual assets were excluded. DISCUSSION: Using advanced prompting techniques, LLMs can generate clinically accurate responses. The study highlights the limitations of proprietary models like ChatGPT, particularly in terms of accessibility and reproducibility due to version deprecation. Future research should employ prompt engineering techniques and prioritize the use of open-source models to ensure research replicability.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。