The Alberta Quality Assessment Tool: Risk of Bias (AQAT:RoB) for the Evaluation of Medical Large Language Model Question-Answer Studies: Development and Pilot Validation

阿尔伯塔质量评估工具：医学大型语言模型问答研究的偏倚风险评估（AQAT:RoB）：开发和试点验证

阅读：1

作者：Ye,Carrie,Mitchell,Joseph Ross,Baumgart,Daniel C,Ma,Zechen,Fung,Angela Lim,Orellana,Daniela Garcia,Chowdhury,Juel,Abass,Abdullah,Katz,Steven,Jaremko,Jacob L,Boulanger,Pierre,Barber,Claire E H,Lemermeyer,Gillian,Jabbari,Hosna,Mou,Lili,Mirzaei,Maryam,Githumbi,Mary Waithera Beckett,Tandon,Puneeta,Goebel,Randy,Clark,Rhys,Hung,Whitney,Abbasi,Marjan,Maleki,Farhad,Klarenbach,Scott,Abdalla,Mohamed

期刊：	Journal of Medical Internet Research	影响因子：	6.000
时间：	2026	起止号：	2026 Apr 8;28:e87057
doi：	10.2196/87057

Abstract

BACKGROUND: Despite the transformative potential of large language models (LLMs) in health care, the rapid development of these tools has outpaced their rigorous evaluation. While artificial intelligence-specific reporting guidelines have been developed to address standardized reporting of artificial intelligence studies, there is currently no specific tool available for risk of bias assessment of LLM question-answer (QA) studies. Existing risk-of-bias tools for medical research are not well suited to the unique challenges of evaluating LLM-QA studies, which creates a critical gap in assessing their safety and effectiveness. OBJECTIVE: This study aims to develop the Alberta Quality Assessment Tool: Risk of Bias (AQAT:RoB) for LLM-QA studies to systematically evaluate the validity and risk of bias in LLM-QA studies. METHODS: We conducted 2 literature reviews. The first was on quality assessment tools for LLM-QA studies, and the second was on LLM-QA studies, which informed the first draft of the AQAT:RoB. The draft AQAT:ROB was further refined through a prespecified iterative process of modified Delphi, consensus meeting, and validation. The first Delphi process occurred between May 1 and May 20, 2025, and the first consensus meeting was held on May 22. The first round of validation was completed by 4 evaluators, who were not part of the consensus meeting, on 16 randomly selected studies. As this first round of validation surpassed our a priori threshold of ≥80% agreement and a Cohen κ of ≥0.61 between evaluators, no further rounds of development and validation were undertaken. A second Delphi process occurred between February 20 and February 23, 2026, to vote on postpilot changes in response to peer review. RESULTS: The AQAT:RoB consists of 5 high-level domains (Questions, Reference Answers, LLM Answers, Evaluators, Outcomes). These domains are subdivided into 9 subdomains. Each subdomain includes at least one "Support for Judgment" and at least one "Type of Bias" and is to be rated "low," "high," or "unclear" for risk of bias. A pilot evaluation was completed by internal validators who were not part of the consensus discussion and were asked to complete the AQAT:RoB form for each assigned study. Each of the 16 studies was evaluated by 2 evaluators independently. Pilot validation showed a percent agreement of 86.1% and a Cohen κ of 0.70 between assessors. CONCLUSIONS: The AQAT:RoB demonstrates promising initial reliability for assessing the validity or risk of bias in LLM-QA studies. The tool will benefit from future refinements, external validation, and periodic updates to keep pace with evolving technology.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。

肿瘤免疫

炎症

T细胞

凋亡

线粒体

转录调控

巨噬细胞

自噬

传染病

氧化应激

肠道菌群

血管生成

磷酸化

囊泡

单细胞

3D/类器官

中性粒细胞

外泌体

药物研究

DNA甲基化

细胞衰老

miRNA

铁死亡

乙酰化

缺氧低氧

泛素化

组蛋白修饰

炎性小体

树突状细胞

肿瘤微环境

代谢重编程

焦亡

lncRNA

m6A/m5C/m7G

空间多组学

细胞基因治疗

内质网应激

治疗耐药

相分离

Treg

免疫代谢

上皮间质转化

染色质重塑

脂质过氧化

蛋白质稳态

铁代谢

脂代谢

cGAS-STING

肠脑轴

乳酸化

氨基酸代谢

细胞极性

碱基编辑

蛋白降解

circRNA

肿瘤异质性

翻译调控

piRNA

NK 细胞

低氧缺氧

氧化脂质

MDSC

溶酶体功能

NETosis

RNA 编辑

细胞干性

CAR-NK

琥珀酰化

冷应激

Tfh

器官芯片

巴豆酰化

表观遗传记忆

空间代谢组

铜死亡

器官纤维化

线粒体未折叠蛋白反应

自噬流

程序性坏死

肠肝轴

MAIT 细胞

丙酰化