Abstract
BACKGROUND: Accurate preoperative prediction of axillary lymph node (ALN) metastasis in breast cancer is crucial for surgical planning and reducing morbidity. Conventional ultrasound and Doppler methods are limited by subjectivity, while existing machine learning (ML) models often lack interpretability and multi-center validation. AIM: To evaluate 11 ML algorithms and develop a validated model integrating ultrasound and Doppler features for ALN metastasis prediction, using SHapley Additive exPlanations (SHAP) for interpretability. METHODS: This retrospective dual-center study included 303 patients from Xiamen (internal cohorts: 212 training, 91 validation) and 102 from Longyan (external validation). Features were extracted from preoperative ultrasound and Doppler images. Recursive feature elimination (RFE) and SHAP selected key predictors. Gradient Boosting was identified as optimal and compared to B-mode/Doppler submodels and clinicopathological scores (Logical, Tumor, Tenon). Performance was assessed via AUC, calibration, decision curve analysis (DCA), and a web calculator was developed. RESULTS: Five features-tumor diameter, cortex-to-hilum ratio, lymph node systolic/diastolic ratio, peak systolic velocity, and end-diastolic velocity-were selected. The combined model achieved AUCs of 0.981 (training), 0.975 (internal validation), and 0.987 (external validation), outperforming scores (AUCs 0.517-0.700). It showed superior calibration (Brier scores 0.045-0.061) and net benefit in DCA. CONCLUSION: The Gradient Boosting model with SHAP provides accurate, interpretable ALN metastasis prediction, supporting noninvasive risk stratification and personalized breast cancer management.