Abstract
BACKGROUND: Triglyceride-to-glucose index (TyG), a key diagnostic marker for insulin resistance, has been linked to colorectal cancer (CRC). Nevertheless, TyG prognostic significance in CRC survival has not been established. This study seeks to develop and validate a robust machine learning (ML) based predictive model combining TyG and inflammatory markers for predicting long-term survival outcomes in CRC patients. METHODS: A retrospective study was performed on (n=1,893) CRC patients who underwent radical surgery at The First Affiliated Hospital of Kunming Medical University. The patients were randomly assigned to training cohort (70%) and an internal validation cohort (30%). An external validation cohort (n=493) from another hospital was used to test model generalizability. Independent prognostic factors were identified via multivariate Cox regression. An integrative predictive model was constructed using various ML algorithms [random survival forest (RSF), eXtreme Gradient Boosting (XGBoost), gradient boosting machines (GBM)], evaluated by concordance index (C-index), receiver operating characteristic (ROC) curve, calibration plots, and decision curve analyses (DCAs). RESULTS: The final model integrated eight independent prognostic factors: age, lymphocyte-to-monocyte ratio (LMR), TyG, carcinoembryonic antigen (CEA), carbohydrate antigen 19-9 (CA199), Union for International Cancer Control (UICC) stage, tumor differentiation, and perineural invasion. The model achieved good predictive accuracy (C-index: training =0.742, internal validation =0.735, external validation =0.752). ROC curves demonstrated robust predictive accuracy for 1-, 3-, and 5-year survival [area under the curve (AUC): 0.79, 0.76, 0.74, respectively]. SHapley Additive exPlanations (SHAP) analysis ranked TyG as the second-most influential prognostic indicator. CONCLUSIONS: Elevated TyG index independently predicts favorable long-term outcomes in CRC patients. Our validated ML model, combining TyG, inflammatory markers, and clinicopathological features, provides a reliable, economical, and practical clinical tool for prognosis assessment. Further multicenter, prospective studies are necessary to confirm the widespread applicability of this model.