Transforming literature screening: The emerging role of large language models in systematic reviews

文献筛选的变革:大型语言模型在系统评价中的新兴作用

阅读:4

Abstract

Systematic reviews (SR) synthesize evidence-based medical literature, but they involve labor-intensive manual article screening. Large language models (LLMs) can select relevant literature, but their quality and efficacy are still being determined compared to humans. We evaluated the overlap between title- and abstract-based selected articles of 18 different LLMs and human-selected articles for three SR. In the three SRs, 185/4,662, 122/1,741, and 45/66 articles have been selected and considered for full-text screening by two independent reviewers. Due to technical variations and the inability of the LLMs to classify all records, the LLM's considered sample sizes were smaller. However, on average, the 18 LLMs classified 4,294 (min 4,130; max 4,329), 1,539 (min 1,449; max 1,574), and 27 (min 22; max 37) of the titles and abstracts correctly as either included or excluded for the three SRs, respectively. Additional analysis revealed that the definitions of the inclusion criteria and conceptual designs significantly influenced the LLM performances. In conclusion, LLMs can reduce one reviewer´s workload between 33% and 93% during title and abstract screening. However, the exact formulation of the inclusion and exclusion criteria should be refined beforehand for ideal support of the LLMs.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。