Abstract
BACKGROUND: Colorectal cancer is a common digestive malignancy, and chemotherapy remains a cornerstone of treatment. Myelosuppression, a frequent hematologic toxicity, poses significant clinical challenges. However, no interpretable machine learning-based nomogram exists to predict chemotherapy-induced myelosuppression in colorectal cancer patients. This study aimed to develop and validate an interpretable clinic-machine learning nomogram integrating clinical predictors with multiple algorithms via a feature mapping algorithm. The model provides accurate risk estimation and clinical interpretability, supporting individualized prevention strategies and optimizing decision-making in patients receiving first-line chemotherapy. AIM: To develop and validate an interpretable clinic-machine learning nomogram predicting chemotherapy-induced myelosuppression in colorectal cancer. METHODS: This retrospective study enrolled 855 colorectal cancer patients receiving first-line chemotherapy. Data were split into training (n = 612), validation (n = 153), and testing (n = 90) cohorts. Ten predictors were identified through least absolute shrinkage and selection operator, decision tree, random forest, and expert consensus. Ten machine learning algorithms were applied, with performance assessed by area under the receiver operating characteristic curve (AUC), area under the precision-recall curve (AUPRC), calibration, and decision curves. The optimal model was integrated into a clinic-machine learning nomogram via the feature mapping algorithm, which was internally validated for predictive accuracy and clinical utility. RESULTS: A total of 855 colorectal cancer patients were enrolled, with 765 cases (April 2020 to December 2023) used for model training and validation, and 90 cases (January 2024 to July 2024) for internal testing. Baseline clinical features did not differ significantly between training and validation cohorts (P > 0.05). Ten predictors were identified through integrated feature selection and expert consensus, including age, body surface area, body mass index, tumor position, albumin, carcinoembryonic antigen, carbohydrate antigen (CA) 19-9, CA125, chemotherapy regimen, and chemotherapy cycles. Among ten machine learning algorithms, extreme gradient boosting achieved the best validation performance (AUC = 0.97, AUPRC = 0.92, sensitivity = 0.79, specificity = 0.92, accuracy = 0.88). Logistic regression confirmed extra trees and random forest as independent predictors, which were incorporated into a clinic-machine learning nomogram. The clinic-machine learning nomogram demonstrated superior discrimination (AUC = 0.96, AUPRC = 0.93, accuracy = 0.90, specificity = 0.95), good calibration, and greater net clinical benefit across a wide probability range (10%-90%). Internal testing further confirmed its robustness and generalizability (AUC = 0.95). CONCLUSION: The clinic-machine learning nomogram accurately predicts chemotherapy-induced myelosuppression in colorectal cancer, providing interpretability and clinical utility to support individualized risk assessment and treatment decision-making.