Abstract
BACKGROUND: In women with ductal carcinoma in situ (DCIS) undergoing breast-conserving surgery, still part will progress to invasive breast cancer (IBC) in the future. Mammograms offer rich tumor data for patient stratification, but current prediction methods focus on clinicopathological factors, overlooking imaging insights. METHODS: We retrospectively analyzed 140 DCIS patients from Harbin Medical University Cancer Hospital (2011-2020, followed up to 2025). Preoperative digital mammograms and clinicopathological data were collected, with mammographic features extracted using pyradiomics and supervised by a senior radiologist. Feature selection employed 10-fold cross-validated LASSO regression. The dataset was split into training (n=100) and validation (n=40) sets (10:4 ratio). Sixteen machine learning algorithms combining mammographic deep learning features and clinicopathological variables were developed and compared for predicting DCIS recurrence. Model performance was assessed using ROC, sensitivity, specificity, PPV, NPV, and SHAP values for interpretation. RESULTS: The Gradient Boosting Machine (GBM) algorithm had the best predictive performance, with an AUC of 0.918 (95% CI 0.873-0.963) in the test set. SHAP values indicated that the mammographic signature (MS) was the most significant predictor, followed by Ki-67 index and histological grade. Patients not receiving radiotherapy had higher recurrence rates than those who did. Decision curve analysis validated the model's clinical utility across various risk thresholds. CONCLUSION: Our study developed an interpretable GBM model incorporating mammographic and clinical data to predict DCIS recurrence (AUC = 0.918). Key predictors were mammographic signature, Ki-67, and tumor grade, offering clinicians a practical tool for personalized postoperative management.