Unbiased microRNA-Disease Association Prediction Using ICD-11 Codes and Negative Sampling

利用ICD-11编码和负样本抽样进行无偏microRNA-疾病关联预测

阅读:2

Abstract

We developed a computational model, called "Unbiased microRNA-disease association predictor (UBMDA)," to predict microRNA-disease associations. UBMDA has two major differences from those reported previously. First, we did not apply a similarity-based feature extraction method, which is the main basis of previous studies. Instead, we used International Classification of Diseases 11th Revision disease codes and microRNA nucleotide sequences as input features. Thus, UBMDA can be applied to newly discovered or poorly studied microRNAs and diseases. Second, we constructed an appropriate negative sample dataset. A positive sample dataset consisting of microRNAs and diseases pairs with proven associations between microRNAs and diseases is publicly available. However, datasets reporting no associations between microRNAs and diseases are rare. Therefore, a negative sample dataset was created by combining microRNAs and diseases. Because more commonly studied microRNAs and diseases are more likely to be included in the positive sample dataset, creating a negative sample dataset without taking this bias into consideration could cause an imbalance in disease and microRNA frequencies between positive and negative sample datasets, leading to biased prediction. To prevent such an imbalance, we created a negative sample dataset considering the frequency of each microRNA and disease in the positive sample dataset, such that these frequencies were similar between the negative and positive sample datasets. We successfully developed a computational model with a simple and intuitive structure. UBMDA will contribute to accelerating the development of microRNA-related biomarkers and therapeutics.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。