Integrating machine learning and bioinformatics approaches for identifying novel diagnostic gene biomarkers in colorectal cancer

整合机器学习和生物信息学方法以识别结直肠癌的新型诊断基因生物标志物

阅读:1

Abstract

This study aimed to identify diagnostic gene biomarkers for colorectal cancer (CRC) by analyzing differentially expressed genes (DEGs) in tumor and adjacent normal samples across five colon cancer gene-expression profiles (GSE10950, GSE25070, GSE41328, GSE74602, GSE142279) from the Gene Expression Omnibus (GEO) database. Intersecting identified DEGs with the module with the highest correlation to gene expression patterns of tumor samples in the gene co-expression network analysis revealed 283 overlapped genes. Centrality measures were calculated for these genes in the reconstructed STRING protein-protein interaction network. Applying LASSO logistic regression, eleven genes were ultimately recognized as candidate diagnostic genes. Among these genes, the area under the receiver operating characteristic curve (AUROC) values for nine genes (CDC25B, CDK4, IQGAP3, MMP1, MMP7, SLC7A5, TEAD4, TRIB3, and UHRF1) surpassed the threshold of 0.92 in both the training and validation sets. We evaluated the diagnostic performance of these genes with four machine learning algorithms: random forest (RF), support vector machines (SVM), artificial neural network (ANN), and gradient boosting machine (GBM). In the testing dataset (GSE21815 and GSE106582), the AUROC scores were greater than 0.95 for all of the machine learning algorithms, indicating the high diagnostic performance of the nine genes. Besides, these nine genes are also significantly correlated to twelve immune cells, namely Mast cells activated, Macrophages M0, M1, and M2, Neutrophils, T cells CD4 memory activated, T cells follicular helper, T cells CD8, T cells CD4 memory resting, B cells memory, Plasma cells, and Mast cells resting (P < 0.05). These results strongly suggest that all of the nine genes have the potential to serve as reliable diagnostic biomarkers for CRC.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。