A human-LLM collaborative annotation approach for screening articles on precision oncology randomized controlled trials

一种用于筛选精准肿瘤学随机对照试验文章的人类-LLM协作注释方法

阅读：1

作者：Chen,Hui,Zhao,Jiale,Zheng,Sheng,Zhang,Xinyu,Duan,Huilong,Lu,Xudong

期刊：	BMC Medical Research Methodology	影响因子：	3.400
时间：	2025	起止号：	2025 Sep 29;25(1):219
doi：	10.1186/s12874-025-02674-3	研究方向：	肿瘤

Abstract

BACKGROUND: Supervised learning can accelerate article screening in systematic reviews, but still requires labor-intensive manual annotation. While large language models (LLMs) like GPT-3.5 offer a rapid and convenient alternative, their reliability is challenging. This study aims to design an efficient and reliable annotation method for article screening. METHODS: Given that relevant articles are typically a small subset of those retrieved articles during screening, we propose a human-LLM collaborative annotation method that focuses on verifying positive annotations made by the LLM. Initially, we optimized the prompt using a manually annotated standard dataset, refining it iteratively to achieve near-perfect recall for the LLM. Subsequently, the LLM, guided by the optimized prompt, annotated the articles, followed by human verification of the LLM-identified positive samples. This method was applied to screen articles on precision oncology randomized controlled trials, evaluating both its efficiency and reliability. RESULTS: For prompt optimization, a standard dataset of 200 manually annotated articles was equally divided into a tuning set and a validation set (1:1 ratio). Through iterative prompt optimization, the LLM achieved near-perfect recall in the tuning and validation sets, with 100% and 85.71%, respectively. Using the optimized prompt, we conducted collaborative annotation. To evaluate its performance, we manually reviewed a random sample of 300 articles that had been annotated using the collaborative annotation method. The results showed that the collaborative annotation achieved an F1 score of 0.9583, reducing the annotation workload by approximately 80% compared to manual annotation alone. Additionally, we trained a BioBERT-based supervised model on the collaborative annotation data, which outperformed the model trained on data annotated solely by the LLM, further validating the reliability of the collaborative annotation method. CONCLUSIONS: The human-LLM collaborative annotation method demonstrates potential for enhancing the efficiency and reliability of article screening, offering valuable support for systematic reviews and meta-analyses.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。