Abstract
BACKGROUND: Colorectal signet ring cell carcinoma (CSRCC) is a rare subtype of colorectal cancer characterized by an exceptionally poor prognosis. Currently, accurate survival prediction models for CSRCC are lacking. This study aimed to investigate the clinical characteristics of CSRCC and to develop and compare multiple machine learning–based models for predicting cancer-specific survival (CSS). METHODS: We retrospectively analyzed data from CSRCC patients diagnosed between January 2000 and December 2021 in the SEER database. Patients were randomly assigned to training and test cohorts in a 7:3 ratio. Prognostic variables were identified using the Boruta algorithm and multivariate Cox regression. Six prediction models were constructed: CoxPH, Lasso regression, Random Forest, XGBoost, GBM, and DeepSurv. Model performance and clinical utility were assessed using C-index, AUC, Brier score, and DCA. Global and local interpretability analyses were performed for the best-performing model. RESULTS: A total of 5,163 patients were included, comprising 3,610 in the training set and 1,553 in the test set. The median survival was 21 months, with 1-, 3-, and 5-year CSS rates of 72.0%, 46.4%, and 40.1%, respectively. The random forest model achieved the best overall performance. In the training set, the C-index was 0.760; the 1-, 3-, and 5-year AUCs were 0.849, 0.866, and 0.883, respectively; and the Brier scores were 0.139, 0.153, and 0.142, respectively. In the test set, the C-index was 0.721; the AUCs were 0.784, 0.808, and 0.813; and the Brier scores were 0.156, 0.176, and 0.168, respectively. Variable importance analysis identified AJCC stage, summary stage, and tumor size as the most influential prognostic factors. CONCLUSION: Random Forest model excels in CSRCC CSS prediction, with robust generalization and clinical potential for individualized prognosis and treatment. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12876-026-04764-y.