Natural Language Processing of Serum Protein Electrophoresis Reports in the Veterans Affairs Health Care System

退伍军人事务部医疗保健系统中血清蛋白电泳报告的自然语言处理

阅读:1

Abstract

PURPOSE: Serum protein electrophoresis (SPEP) is a clinical tool used to screen for monoclonal gammopathy, thus it is a critical tool in the evaluation of patients with multiple myeloma. However, SPEP laboratory results are usually returned as short text reports, which are not amenable to simple computerized processing for large-scale studies. We applied natural language processing (NLP) to detect monoclonal gammopathy in SPEP laboratory results and compared its performance at multiple hospitals using both a rules-based manual system and a machine-learning algorithm. METHODS: We used the data from the VA Corporate Data Warehouse, which comprises data from 20 million unique individuals. SPEP reports were collected from July to December 2015 at 5 Veterans Affairs Medical Centers. Of these reports, we annotated the presence or absence of monoclonal gammopathy in 300 reports. We applied a machine learning-based NLP and a manual rules-based NLP to detect monoclonal gammopathy in SPEP reports at each of the hospitals, then applied the model from 1 hospital to each of the other hospitals. RESULTS: The learning system achieved an area under the receiver operating characteristic curve of 0.997, and the rules-based system achieved an accuracy of 0.99. When a model trained on 1 hospital's data was applied to a different hospital, however, accuracy varied greatly, and the learning-based models performed better than the rules-based model. CONCLUSION: Binary classification of short clinical texts such as SPEP reports may be a particularly attractive target on which to train highly accurate NLP systems.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。