Benchmark data set for breast cancer associated genes

乳腺癌相关基因基准数据集

阅读:2

Abstract

Breast cancer is one of the leading causes of death in women worldwide. The main reason could be inheritance, change in environmental conditions or the mutation in certain genes that cause cancer. These genes are not negligible, on the contrary, a wide range of genes have their involvement in the development and progression of different stages of breast cancer. In this article, we are going to explore the association of breast cancer genes and classify them into different association classes viz. positive, negative and neutral. Among all the available biomedical literature resources for a disease, HuGE Navigator is a major resource comprising continually updated human genome epidemiology data controlled by the Centers for Disease Control and Prevention. However the literature finder module of HuGE Navigator only yields PubMed IDs for a specific disease, which are explored further to retrieve abstract data from PubMed. These abstracts are filtered out to include those reference sentences which have at least one gene and disease term. This reference sentence data has been taken as a reference to apply double-fold cross-validation to compile the most comprehensive list and then classify them into different association classes viz, positive, negative or neutral along with the reference sentences confirming the association of the disease with the gene. The positively associated data generated here can be used for breast cancer modelling or meta-analysis of breast cancer. The data generated in the present work can be used as standard reference data for the training of text mining-based biological literature classifiers to predict the class of published literature not only in breast cancer but in other diseases as well.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。