Large language models standardize the interpretation of complex oncology guidelines for brain metastases

大型语言模型规范了脑转移瘤复杂肿瘤学指南的解读。

阅读：1

作者：Akkus Yildirim,Berna,Tutun,Baver,Durak,Gorkem,Yildirim,Emre Batuhan,Uysal,Emre,Erturk,Sukru Mehmet,Bagci,Ulas

期刊：		影响因子：
时间：	2025	起止号：	2025 Dec 16;6(1):56
doi：	10.1038/s43856-025-01315-6	研究方向：	肿瘤、神经科学

Abstract

BACKGROUND: The interpretation of nuanced recommendations within complex clinical oncology guidelines, such as those for brain metastases, presents persistent challenges for medical experts, potentially impacting treatment consistency. While Large Language Models offer potential decision support, their comparative efficacy in this domain remains underexplored. This study evaluated the accuracy and convergence of medical experts versus leading Large Language Models in interpreting Strength of Recommendation and Quality of Evidence from the ASTRO and ASCO-SNO-ASTRO brain metastases guidelines. METHODS: Neurosurgeons, radiation oncologists, and four Large Language Models (ChatGPT-4o, Gemini 2.0, Microsoft Copilot Pro, DeepSeek R1) assessed the Strength of Recommendation and Quality of Evidence for guideline recommendations. Accuracy, near-answer rates, and Cohen's weighted kappa (κ) were calculated. RESULTS: Large Language Models, notably Gemini and DeepSeek, demonstrate significantly higher accuracy (up to 100% for ASTRO Strength of Recommendation vs. a maximum 58.82% for experts) and near-perfect convergence (κ up to 1.000 vs. κ ≤ 0.504 for experts) in interpreting ASTRO guideline specifics. While all groups found the Quality of Evidence and the more complex ASCO guideline more challenging, Large Language Models generally maintain an advantage in convergence, with Deepseek achieving 61.53% accuracy and κ = 0.428 for ASCO Strength of Recommendation versus a maximum 53.84% accuracy and highly variable convergence for experts. CONCLUSIONS: Large Language Models demonstrate significantly higher accuracy than human experts in structured interpretation of guideline classifications, with near-perfect inter- Large Language Model convergence. This supports their role as standardization tools for guideline parsing - freeing experts for patient-specific reasoning where clinical context, comorbidities, and preferences dominate decision-making.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。