Applying text-mining to clinical notes: the identification of patient characteristics from electronic health records (EHRs)

将文本挖掘应用于临床笔记:从电子健康记录(EHR)中识别患者特征

阅读:2

Abstract

BACKGROUND: Clinical notes contain information on critical patient characteristics, which, if overlooked, could escalate the risk of adverse events as well as miscommunication between the healthcare professional and the patient. This study investigates the feasibility of employing text-mining to extract patient characteristics from Electronic Health Records (EHRs) and compares the effectiveness of text-mining against human intelligence for identifying four patient characteristics: language barrier, living alone, cognitive frailty and non-adherence. METHODS: A manual "golden" standard was created from 1,120 patient files (878 patients) that had unplanned hospital readmissions. Each patient was categorized in one (or multiple) of the four characteristics with supporting clinical notes extracted from their EHRs. For simple terminology, a rule-based (RB) SQL query was used, and for complex terms, Named Entity Recognition (NER) models were used. Model performance was compared to the manual standard. The primary outcomes were recall, specificity, precision, negative predictive value (NPV) and F1-score. RESULTS: Performance of each patient characteristic was evaluated using a separate train/test dataset. An additional validation dataset was used for the NER models. Within the train/test set, the language barrier RB query achieved a recall of 0.99 (specificity of 0.96). The living alone NER model achieved a recall of 0.86 (specificity of 0.94) on the train/test set and a recall of 0.81 (specificity of 1.00) on the validation set. In that same order, the cognitive frailty model yielded a recall of 0.59 (specificity 0.76) on the train/test set and a recall of 0.73 (specificity 0.96) on the validation set. The NER model for non-adherence achieved a recall of 0.75 (specificity of 0.99) on the train/test set, and a recall of 0.90 (specificity of 0.99) on the validation set. The models showed the tendency to overestimate the presence of patient characteristics such as identifying a family member's language barrier as the patient's. CONCLUSION: This study successfully demonstrated the feasibility of applying text-mining to identify patient characteristics from EHRs. Also, it seems for more complex terminology, NER models outperform the rule-based option. Future work involves refining these models for broader application in clinical settings. CLINICAL TRIAL NUMBER: Not applicable.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。