Abstract
BACKGROUND: Gastrointestinal cancer is a common malignant tumor with high incidence and poor prognosis. Accurate prediction of prognosis can improve the treatment of cancer patients, but the clinical features currently used provide insufficient information. This study aimed to establish an efficient survival prediction model for gastrointestinal cancer based on gene expression and clinical data. METHODS: Based on the gastrointestinal cancer samples in The Cancer Genome Atlas, we established efficient gastrointestinal cancer survival prediction models with gene expression profiling data as input molecular features. A series of bioinformatics methods were applied to conduct a comprehensive analysis of the identified gastrointestinal cancer-related genes. The molecular mechanism by which newly identified gastrointestinal cancer-related genes mediate cancer occurrence was preliminarily explored. RESULTS: Random forest-based model (I) had an accuracy of 94.98% with Mathew's correlation coefficient (MCC) of 0.8995. Support vector machine-based model (II) had an accuracy of 94.98% with MCC of 0.9000. We found a significant difference in survival between the two subtypes (S1 and S2, 3-year survival rates ≥75% and ≤45%, respectively). These subtypes have independent predictive value for patient survival. The models constructed in this study exhibit inherent interpretability. Twenty key genes related to gastrointestinal cancer were successfully identified. The comprehensive functional analysis in this study provides important clues for elucidating the potential mechanisms of action of the selected cancer-related genes in tumor initiation and progression. Most importantly, we conducted drug target predictions for these genes and successfully identified potential targeted drugs for seven genes (NR3C1, HNF4A, DNAAF9, CDX2, ATP2B4, RBMS3, LIFR). CONCLUSIONS: The findings of this study hold significant implications for predicting survival and treatment decisions in gastrointestinal cancer.