Replacing non-biomedical concepts improves embedding of biomedical concepts

用非生物医学概念替代生物医学概念可以更好地融入生物医学概念。

阅读:1

Abstract

OBJECTIVES: Concept embeddings are low-dimensional vector representations of concepts such as MeSH:D009203 (Myocardial Infarction), whose similarity in the embedded vector space reflects their semantic similarity. Here, we test the hypothesis that non-biomedical concept synonym replacement can improve the quality of biomedical concepts embeddings. MATERIALS AND METHODS: We developed an approach that leverages WordNet to replace sets of synonyms with the most common representative of the synonym set. RESULTS: We tested our approach on 1055 concept sets and found that, on average, the mean intra-cluster distance was reduced by 8% in the vector-space. Assuming that homophily of related concepts in the vector space is desirable, our approach tends to improve the quality of embeddings. DISCUSSION AND CONCLUSION: This pilot study shows that non-biomedical synonym replacement tends to improve the quality of embeddings of biomedical concepts using the Word2Vec algorithm. We have implemented our approach in a freely available Python package available at https://github.com/TheJacksonLaboratory/wn2vec.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。