Abstract
BACKGROUND: Accurate prediction of response to neoadjuvant chemoradiotherapy (nCRT) in patients with locally advanced rectal cancer (LARC) is essential for optimizing treatment decisions. This study aimed to develop interpretable machine learning models based on computed tomography (CT) radiomics and clinical biomarkers to predict nCRT efficacy. METHODS: A total of 272 patients with pathologically confirmed LARC were retrospectively included and divided into training (n = 156), internal validation (n = 67), and external validation (n = 49) sets. Radiomics features were extracted from pretreatment contrast-enhanced CT images. A radiomics score (R-score) was constructed from 10 LASSO-selected features with high reproducibility (intraclass correlation coefficient > 0.75). Clinical variables including carcinoembryonic antigen (CEA) and carbohydrate antigen 19 − 9 (CA19-9) were incorporated. Logistic regression, support vector machine, random forest, decision tree, and XGBoost algorithms were used to develop predictive models. Model performance was assessed by area under the receiver operating characteristic curve (AUC), calibration curve, and decision curve analysis (DCA). SHapley Additive exPlanations (SHAP) were used to interpret model output. RESULTS: In the test cohorts, the combined model using the XGBoost algorithm outperformed clinical-only and imaging-only models, achieving AUCs of 0.844 (internal validation) and 0.800 (external validation). The R-score was significantly higher in responders than in non-responders (P < 0.05 across all datasets). DCA demonstrated superior clinical net benefit of the combined model across threshold probabilities. SHAP analysis confirmed R-score as the most influential predictor of response. CONCLUSIONS: The XGBoost-based combined model integrating CT radiomics and clinical biomarkers demonstrated robust performance and good interpretability in predicting nCRT response in LARC patients. This approach may support individualized treatment planning and risk stratification in clinical practice. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12880-026-02269-4.