An integrated approach for rare disease detection and classification in Spanish pediatric medical reports

西班牙儿科医疗报告中罕见病检测和分类的综合方法

阅读:1

Abstract

Rare disease detection and classification is one of the most significant challenges in the application of Natural Language Processing techniques to the analysis and extraction of information from biomedical texts. In this paper, we present a novel research focused on the detection and classification of rare diseases in clinical notes extracted from a cohort of pediatric patients from the Community of Madrid in Spain. From a set of collected and anonymized medical records, we propose a semi-supervised, keyphrase-based system to perform an initial detection of mentions of rare diseases, which is then validated and refined by experts to build a consolidated dataset concerning a subset of different rare diseases. Based on this dataset, we carry out a series of experiments for rare disease classification using both a semi-supervised technique and state-of-the-art supervised systems based on both discriminative and generative models. A detailed case analysis provides insights on which systems excel in specific scenarios and why. The validated dataset contains a total of 1900 annotated texts containing mentions to rare diseases. Experiments on this dataset show that the best supervised models improve the performance of the semi-supervised system by more than 10% (78.74% vs 67.37% micro-average F-Measure), individually enhancing the classification of a significant number of diseases in the dataset. State-of-the-art supervised systems are able to offer promising results on the detection and classification of rare diseases in clinical texts, even in cases for which the amount of annotated information is low. On the other hand, semi-supervised models present interesting capabilities for dealing with limited information and data in the field.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。