Enabling Just-in-Time Clinical Oncology Analysis With Large Language Models: Feasibility and Validation Study Using Unstructured Synthetic Data

利用大型语言模型实现即时临床肿瘤分析：基于非结构化合成数据的可行性和验证研究

阅读：3

作者：May,Peter,Greß,Julian,Seidel,Christoph,Sommer,Sebastian,Schuler,Markus K,Nokodian,Sina,Schröder,Florian,Jung,Johannes

期刊：	JMIR Medical Informatics	影响因子：	3.800
时间：	2025	起止号：	2025 Dec 1;13:e78332
doi：	10.2196/78332	研究方向：	肿瘤

Abstract

BACKGROUND: Traditional cancer registries, limited by labor-intensive manual data abstraction and rigid, predefined schemas, often hinder timely and comprehensive oncology research. While large language models (LLMs) have shown promise in automating data extraction, their potential to perform direct, just-in-time (JIT) analysis on unstructured clinical narratives-potentially bypassing intermediate structured databases for many analytical tasks-remains largely unexplored. OBJECTIVE: This study aimed to evaluate whether a state-of-the-art LLM (Gemini 2.5 Pro) can enable a JIT clinical oncology analysis paradigm by assessing its ability to (1) perform high-fidelity multiparameter data extraction, (2) answer complex clinical queries directly from raw text, (3) automate multistep survival analyses including executable code generation, and (4) generate novel, clinically plausible hypotheses from free-text documentation. METHODS: A synthetic dataset of 240 unstructured clinical letters from patients with stage IV non-small cell lung cancer (NSCLC), embedding 14 predefined variables, was used. Gemini 2.5 Pro was evaluated on four core JIT capabilities. Performance was measured by using the following metrics: extraction accuracy (compared to human extraction of n=40 letters and across the full n=240 dataset); numerical deviation for direct question answering (n=40 to 240 letters, 5 questions); log-rank P value and Harrell concordance index for LLM-generated versus ground-truth Kaplan-Meier survival analyses (n=160 letters, overall survival and progression-free survival); and correct justification, novelty, and a qualitative evaluation of LLM-generated hypotheses (n=80 and n=160 letters). RESULTS: For multiparameter extraction from 40 letters, the LLM achieved >99% average accuracy, comparable to human extraction, but in significantly less time (LLM: 3.7 min vs human: 133.8 min). Across the full 240-letter dataset, LLM multiparameter extraction maintained >98% accuracy for most variables. The LLM answered multiconditional clinical queries directly from raw text with a relative deviation rarely exceeding 1.5%, even with up to 240 letters. Crucially, it autonomously performed end-to-end survival analysis, generating text-to-R-code that produced Kaplan-Meier curves statistically indistinguishable from ground truth. Consistent performance was demonstrated on a small validation cohort of 80 synthetic acute myeloid leukemia reports. Stress testing on data with simulated imperfections revealed a key role of a human-in-the-loop to resolve AI-flagged ambiguities. Furthermore, the LLM generated several correctly justified, biologically plausible, and potentially novel hypotheses from datasets up to 80 letters. CONCLUSIONS: This feasibility study demonstrated that a frontier LLM (Gemini 2.5 Pro) can successfully perform high-fidelity data extraction, multiconditional querying, and automated survival analysis directly from unstructured text. These results provide a foundational proof of concept for the JIT clinical analysis approach. However, these findings are confined to synthetic patients, and rigorous validation on real-world clinical data is an essential next step before clinical implementation can be considered.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。