Assessing Large Language Models for Early Article Identification in Otolaryngology-Head and Neck Surgery Systematic Reviews

评估大型语言模型在耳鼻咽喉头颈外科系统评价中早期文章识别的应用

阅读:3

Abstract

BACKGROUND: Assess ChatGPT and Bard's effectiveness in the initial identification of articles for Otolaryngology-Head and Neck Surgery systematic literature reviews. METHODS: Three PRISMA-based systematic reviews (Jabbour et al. 2017, Wong et al. 2018, and Wu et al. 2021) were replicated using ChatGPTv3.5 and Bard. Outputs (author, title, publication year, and journal) were compared to the original references and cross-referenced with medical databases for authenticity and recall. RESULTS: Several themes emerged when comparing Bard and ChatGPT across the three reviews. Bard generated more outputs and had greater recall in Wong et al.'s review, with a broader date range in Jabbour et al.'s review. In Wu et al.'s review, ChatGPT-2 had higher recall and identified more authentic outputs than Bard-2. CONCLUSION: Large language models (LLMs) failed to fully replicate peer-reviewed methodologies, producing outputs with inaccuracies but identifying relevant, especially recent, articles missed by the references. While human-led PRISMA-based reviews remain the gold standard, refining LLMs for literature reviews shows potential.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。