Artificial Intelligence-Assisted Data Extraction With a Large Language Model: A Study Within Reviews

基于大型语言模型的AI辅助数据提取：一项综述研究

阅读：1

作者：Gartlehner,Gerald,Kugley,Shannon,Crotty,Karen,Viswanathan,Meera,Dobrescu,Andreea,Nussbaumer-Streit,Barbara,Booth,Graham,Treadwell,Jonathan R,Han,Jung Min,Wagner,Jesse,Apaydin,Eric A,Coppola,Erin L,Maglione,Margaret,Hilscher,Rainer,Chew,Robert,Pilar,Meagan,Swanton,Bryan,Kahwati,Leila C

期刊：	Annals of Internal Medicine	影响因子：	15.200
时间：	2025	起止号：	2025 Dec;178(12):1763-1771
doi：	10.7326/ANNALS-25-00739

Abstract

BACKGROUND: Data extraction is a critical but error-prone and labor-intensive task in evidence synthesis. Unlike other artificial intelligence (AI) technologies, large language models (LLMs) do not require labeled training data for data extraction. OBJECTIVE: To compare an AI-assisted versus a traditional, human-only data extraction process. DESIGN: Study within reviews (SWAR) using a prospective, parallel-group comparison with blinded data adjudicators. SETTING: Workflow validation within 6 ongoing systematic reviews of interventions under real-world conditions. INTERVENTION: Initial data extraction using an LLM (Claude, versions 2.1, 3.0 Opus, and 3.5 Sonnet) verified by a human reviewer. MEASUREMENTS: Concordance, time on task, accuracy, sensitivity, positive predictive value, and error analysis. RESULTS: The 6 systematic reviews in the SWAR yielded 9341 data elements from 63 studies. Concordance between the 2 methods was 77.2% (95% CI, 76.3% to 78.0%). Compared with the reference standard, the AI-assisted approach had an accuracy of 91.0% (CI, 90.4% to 91.6%) and the human-only approach an accuracy of 89.0% (CI, 88.3% to 89.6%). Sensitivities were 89.4% (CI, 88.6% to 90.1%) and 86.5% (CI, 85.7% to 87.3%), respectively, with positive predictive values of 99.2% (CI, 99.0% to 99.4%) and 98.9% (CI, 98.6% to 99.1%). Incorrect data were extracted in 9.0% (CI, 8.4% to 9.6%) of AI-assisted cases and 11.0% (CI, 10.4% to 11.7%) of human-only cases, with corresponding proportions of major errors of 2.5% (CI, 2.2% to 2.8%) versus 2.7% (CI, 2.4% to 3.1%). Missed data items were the most frequent error type in both approaches. The AI-assisted method reduced data extraction time by a median of 41 minutes per study. LIMITATIONS: Assessing concordance and classifying errors required subjective judgment. Consistently tracking time on task was challenging. CONCLUSION: Data extraction assisted by AI may offer a viable, more efficient alternative to human-only methods. PRIMARY FUNDING SOURCE: Agency for Healthcare Research and Quality and RTI International.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。

肿瘤免疫

炎症

T细胞

线粒体

凋亡

转录调控

巨噬细胞

自噬

传染病

氧化应激

肠道菌群

磷酸化

血管生成

囊泡

3D/类器官

单细胞

中性粒细胞

外泌体

DNA甲基化

miRNA

药物研究

铁死亡

细胞衰老

乙酰化

缺氧低氧

泛素化

树突状细胞

组蛋白修饰

炎性小体

肿瘤微环境

lncRNA

代谢重编程

焦亡

m6A/m5C/m7G

内质网应激

空间多组学

细胞基因治疗

治疗耐药

相分离

Treg

上皮间质转化

免疫代谢

染色质重塑

脂质过氧化

脂代谢

蛋白质稳态

铁代谢

细胞极性

氨基酸代谢

碱基编辑

cGAS-STING

肠脑轴

蛋白降解

乳酸化

翻译调控

circRNA

piRNA

肿瘤异质性

NK 细胞

氧化脂质

MDSC

NETosis

低氧缺氧

溶酶体功能

细胞干性

琥珀酰化

CAR-NK

冷应激

RNA 编辑

Tfh

巴豆酰化

器官芯片

表观遗传记忆

铜死亡

器官纤维化

线粒体未折叠蛋白反应

空间代谢组

程序性坏死

自噬流

肠肝轴

丙酰化

MAIT 细胞