Abstract
BACKGROUND: The increasing incidence and mortality of colon adenocarcinoma (COAD) underscore the urgent clinical need for improved prognostic biomarkers. Current prognostic models often lack precision, highlighting the necessity for research focused on molecular signatures derived from The Cancer Genome Atlas (TCGA). This study aims to address this gap by utilizing TCGA data to identify a robust prognosis prediction model. METHODS: RNA-sequencing datasets and information on the clinical features of COAD patients were sourced from the TCGA database. Non-negative matrix factorization (NMF) was applied to TCGA-COAD cohort to identify the molecular subtypes associated with taurine metabolism. A comparative analysis was conducted to evaluate immune infiltration and survival outcomes across the identified subtypes. Subsequently, we evaluated prognosis outcomes, specifically survival rates and recurrence, using Kaplan-Meier analysis. The prediction model was developed using a training set derived from TCGA data, employing least absolute shrinkage and selection operator (LASSO) regression and multivariate Cox regression techniques. RESULTS: The analysis identified a total of 597 genes with prognostic significance in COAD, among which several taurine metabolism-related genes were identified, including HSPB1, NOS2, LEP, KPNA2, SERPINA1, NR1H2, ENO2, HSPA1A, TRPV1, GSR, ALOX12, GABRD, TERT, CLCN3, AGMAT, NOTCH3, and MYB. Based on the expression profiles linked to the taurine metabolism-related genes, the NMF algorithm successfully classified patients from the TCGA-COAD cohort into two distinct expression clusters: cluster 1 (C1) and cluster 2 (C2). To examine the underlying mechanisms differentiating these two clusters, 199 differentially expressed genes (DEGs) were identified. A Gene Ontology (GO) analysis of these DEGs revealed that they were primarily engaged in biological processes such as extracellular matrix (ECM) organization, collagen fibril organization, and cell-substrate adhesion. Notably, disparities in immune activity were observed between the two taurine metabolism-related clusters in COAD. The cancer stem cell (CSC) scores of the patients in C1 in the TCGA-COAD cohort were significantly higher than those of the patients in C2. Further investigations using the LASSO and Cox regression methods led to the identification of 17 genes implicated in taurine metabolism associated with COAD. Subsequently, a prognostic model comprising nine genes (i.e., LEP, SERPINA1, ENO2, HSPA1A, GSR, GABRD, TERT, NOTCH3, and MYB) was developed to predict the prognosis of COAD patients. Furthermore, the efficacy of the prognostic model was evaluated via a receiver operating characteristic curve analysis, which revealed area under the curve values of 0.698 for 1 year, 0.699 for 3 years, and 0.73 for 5 years. CONCLUSIONS: The findings of this study have significant clinical implications, suggesting that our nine-gene prognostic model could be integrated into routine clinical practice to enhance patient stratification and inform treatment decisions for COAD. Future research should focus on prospective validation and exploration of therapeutic targets within the identified genes.