Clinical text mining of the performance status and progression-free survival to facilitate data collection in cancer research: an exploratory study

利用临床文本挖掘患者体能状态和无进展生存期信息以促进癌症研究中的数据收集:一项探索性研究

阅读:5

Abstract

BACKGROUND: Modern electronic medical records (EMRs) contain a valuable amount of data. These data can be unlocked for research by manual data collection, which is highly labor intensive. Therefore, we explored whether automated text mining (TM) could be used to extract the performance status (PS) and progression-free survival (PFS) in a cohort of 328 non-small-cell lung cancer patients. MATERIALS AND METHODS: Unstructured Dutch text data were derived from different EMR fields containing mainly information recorded during outpatient visits. A rule-based TM approach using regular expressions was used to extract PS and PFS in the R programming language. For PS, quantitative evaluation metrics, such as the weighted F1-score, were used to determine the accuracy of the TM-extracted data. For PFS, the median PFS was compared between the two approaches using the Kaplan-Meier method. In addition, the C-index was determined. RESULTS: A PS was obtained for 196 patients (60%) using the TM approach. In 189 (96%) patients, the TM-curated PS matched the manually curated PS. The weighted F1-score was 96.5%. The median PFS was 7.42 months for the manually curated data (n = 328) and 8.00 months for the TM-curated data (n = 301). The C-index was 0.916. CONCLUSIONS: The developed TM approach is able to extract PS and PFS from the EMR with a very good performance. Therefore, this approach increases the efficiency of reliable data collection from EMRs, facilitating the use of real-world data (RWD) in clinical research.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。