Integrated multiple microarray studies by robust rank aggregation to identify immune-associated biomarkers in Crohn's disease based on three machine learning methods.

通过稳健的排序聚合整合多个微阵列研究,基于三种机器学习方法识别克罗恩病中的免疫相关生物标志物

阅读:3
作者:Chen Zi-An, Ma Hui-Hui, Wang Yan, Tian Hui, Mi Jian-Wei, Yao Dong-Mei, Yang Chuan-Jie
Crohn's disease (CD) is a complex autoimmune disorder presumed to be driven by complex interactions of genetic, immune, microbial and even environmental factors. Intrinsic molecular mechanisms in CD, however, remain poorly understood. The identification of novel biomarkers in CD cases based on larger samples through machine learning approaches may inform the diagnosis and treatment of diseases. A comprehensive analysis was conducted on all CD datasets of Gene Expression Omnibus (GEO); our team then used the robust rank aggregation (RRA) method to identify differentially expressed genes (DEGs) between controls and CD patients. PPI (protein‒protein interaction) network and functional enrichment analyses were performed to investigate the potential functions of the DEGs, with molecular complex detection (MCODE) identifying some important functional modules from the PPI network. Three machine learning algorithms, support vector machine-recursive feature elimination (SVM-RFE), random forest (RF), and least absolute shrinkage and selection operator (LASSO), were applied to determine characteristic genes, which were verified by ROC curve analysis and immunohistochemistry (IHC) using clinical samples. Univariable and multivariable logistic regression were used to establish a machine learning score for diagnosis. Single-sample GSEA (ssGSEA) was performed to examine the correlation between immune infiltration and biomarkers. In total, 5 datasets met the inclusion criteria: GSE75214, GSE95095, GSE126124, GSE179285, and GSE186582. Based on RRA integrated analysis, 203 significant DEGs were identified (120 upregulated genes and 83 downregulated genes), and MCODE revealed some important functional modules in the PPI network. Machine learning identified LCN2, REG1A, AQP9, CCL2, GIP, PROK2, DEFA5, CXCL9, and NAMPT; AQP9, PROK2, LCN2, and NAMPT were further verified by ROC curves and IHC in the external cohort. The final machine learning score was defined as [Expression level of AQP9 × (2.644)] + [Expression level of LCN2 × (0.958)] + [Expression level of NAMPT × (1.115)]. ssGSEA showed markedly elevated levels of dendritic cells and innate immune cells, such as macrophages and NK cells, in CD, consistent with the gene enrichment results that the DEGs are mainly involved in the IL-17 signaling pathway and humoral immune response. The selected biomarkers analyzed by the RRA method and machine learning are highly reliable. These findings improve our understanding of the molecular mechanisms of CD pathogenesis.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。