Automated Esophageal Cancer Staging From Free-Text Radiology Reports: Large Language Model Evaluation Study

基于自由文本放射学报告的食管癌自动分期：大型语言模型评估研究

阅读：1

作者：Yao,Yao,Cen,Xingxing,Gan,Lu,Jiang,Jiehui,Wang,Min,Xu,Yinghui,Yuan,Junyi

期刊：	JMIR Medical Informatics	影响因子：	3.800
时间：	2025	起止号：	2025 Oct 17;13:e75556
doi：	10.2196/75556	研究方向：	肿瘤
疾病类型：	食管癌

Abstract

BACKGROUND: Accurate staging of esophageal cancer is crucial for determining prognosis and guiding treatment strategies, but manual interpretation of radiology reports by clinicians is prone to variability and limited accuracy, resulting in reduced staging accuracy. Recent advances in large language models (LLMs) have shown promise in medical applications, but their utility in esophageal cancer staging remains underexplored. OBJECTIVE: This study aims to compare the performance of 3 locally deployed LLMs (INF-72B, Qwen2.5-72B, and LLaMA3.1-70B) and clinicians in preoperative esophageal cancer staging using free-text radiology reports. METHODS: This retrospective study included 200 patients from Shanghai Chest Hospital who underwent esophageal cancer surgery from May to December 2024. The dataset consisted of 1134 Chinese free-text radiology reports. The reference standard was derived from postoperative pathological staging. A total of 3 LLMs determined tumor classification (T1-T4), node classification (N0-N3), and overall staging (I-IV) using 3 prompting strategies (zero-shot, chain-of-thought, and a proposed interpretable reasoning [IR] method). The McNemar test and Pearson chi-square test were used for comparisons. RESULTS: INF-72B+IR achieved a superior overall staging accuracy of 61.5% and an F1-score of 0.60, substantially higher than the clinicians' accuracy of 39.5% and F1-score of 0.39 (all P<.001). Qwen2.5-72B+IR also demonstrated an advantage, achieving an overall staging accuracy of 46% and an F1-score of 0.51, which was better than the clinicians' performance (P<.001). LLaMA3.1-70B showed no statistically significant difference in overall staging performance compared to clinicians (all P>0.5). CONCLUSIONS: This study demonstrates that LLMs, particularly when guided by the proposed IR strategy, can accurately and reliably perform esophageal cancer staging from free-text radiology reports. This approach not only provides high-performance predictions but also offers a transparent and verifiable reasoning process, highlighting its potential as a valuable decision-support tool to augment human expertise in complex clinical diagnostic tasks.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。