Semantic classification of biomedical concepts using distributional similarity

基于分布相似性的生物医学概念语义分类

阅读:1

Abstract

OBJECTIVE: To develop an automated, high-throughput, and reproducible method for reclassifying and validating ontological concepts for natural language processing applications. DESIGN: We developed a distributional similarity approach to classify the Unified Medical Language System (UMLS) concepts. Classification models were built for seven broad biomedically relevant semantic classes created by grouping subsets of the UMLS semantic types. We used contextual features based on syntactic properties obtained from two different large corpora and used alpha-skew divergence as the similarity measure. MEASUREMENTS: The testing sets were automatically generated based on the changes by the National Library of Medicine to the semantic classification of concepts from the UMLS 2005AA to the 2006AA release. Error rates were calculated and a misclassification analysis was performed. RESULTS: The estimated lowest error rates were 0.198 and 0.116 when considering the correct classification to be covered by our top prediction and top 2 predictions, respectively. CONCLUSION: The results demonstrated that the distributional similarity approach can recommend high level semantic classification suitable for use in natural language processing.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。