Automated extraction of temporalized tumor evolution from oncology EMRs using natural language processing

利用自然语言处理技术从肿瘤电子病历中自动提取肿瘤的时间演变信息

阅读:1

Abstract

BACKGROUND: Extracting temporally sensitive outcomes such as tumor progression from unstructured electronic medical records (EMRs) remains a major challenge in oncology. This study evaluates a solution with a domain-adapted natural language processing (NLP) pipeline designed to extract structured, temporally anchored clinical outcomes from narrative EMR data. PATIENTS AND METHODS: Patients with oncogene-addicted advanced or metastatic non-small-cell lung cancer (NSCLC) treated with oral targeted therapies between January 2020 and June 2023 at a French academic hospital were included. Extracted Facts were benchmarked against expert annotations. All outputs were mapped to Observational Medical Outcome Partnership vocabularies. F1-scores were calculated for the correct Concept detection without and with their Temporality. Real-world progression-free survival (rwPFS) was estimated based on retrieved clinical outcomes. RESULTS: Among 1030 NSCLC patients treated between 2020 and 2023, 112 were confirmed to have advanced or metastatic disease with an oncogenic driver mutation, primarily EGFR (n = 66), ALK (n = 23), and KRAS (n = 16). The NLP pipeline achieved high accuracy in extracting clinical concepts, with an F1-score of 79.7% for tumor evolution concepts and 62.0% when temporality was included. Overall performance across all domains reached F1-scores of 76.5% for concept extraction and 63.7% with temporality. Median rwPFS was 21.9 months for EGFR-mutated, 52.4 months for ALK-translocated, and 5.0 months for KRAS-mutant tumors, in line with published benchmarks. Reviewing automatically collected data was 5.8 times faster compared with manual collection. CONCLUSIONS: Our solution demonstrates robust performance for extracting temporally structured tumor outcomes from EMRs and supports the reconstruction of real-world endpoints in oncology.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。