Clinical Information Extraction From Notes of Veterans With Lymphoid Malignancies: Natural Language Processing Study

从患有淋巴系统恶性肿瘤的退伍军人病历中提取临床信息:自然语言处理研究

阅读:2

Abstract

BACKGROUND: Clinical natural language processing (cNLP) techniques are commonly developed and used to extract information from clinical notes to facilitate clinical decision-making and research. However, they are less established for rare diseases such as lymphoid malignancies due to the lack of annotated data as well as the heterogeneity and complexity of how clinical information is documented. In addition, there is increasing evidence that cNLP techniques may be prone to biases embedded in clinical documentation or model development. These biases can result in disparities in performance when extracting clinical information or predicting patient outcomes. OBJECTIVE: This study aims to report the development and validation of a cNLP pipeline that extracts clinical information such as performance status, staging, and diagnosis, as well as less common information such as substance use and military environmental exposures, from the clinical notes of veterans with lymphoid malignancies. METHODS: We developed a rule-based cNLP pipeline that integrates domain expertise. We tested and compared the performance of the cNLP pipeline on notes from 2 veteran patient cohorts: one from non-Hispanic White veterans and the other from non-Hispanic Black veterans. RESULTS: Overall, our pipeline achieved promising performance on our study data, especially for extracting entities that have standard clinical documentation, such as performance status. We also found that while the pipeline has robust performance across the two patient groups, the false-positive and false-negative rates were significantly associated with race for detecting the primary diagnosis (P=.001 for both); the false-negative rate was significantly associated with race for identifying substance use (P=.02). CONCLUSIONS: The system exhibits satisfying and comparable performance for most clinical entities of interest except for (1) the primary diagnosis and (2) substance use. Future work will address the challenges encountered in developing and deploying the cNLP pipeline on the Department of Veterans Affairs data for rare cancers and enhance the performance of cNLP systems to avoid biases.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。