Abstract
OBJECTIVE: To develop explainable machine learning models that integrate multimodal imaging and pathological biomarkers to predict axillary lymph node metastasis (ALNM) in breast cancer patients and assess their clinical utility. MATERIALS AND METHODS: A retrospective study was conducted on clinical data from 401 patients with pathologically confirmed breast cancer. Ten machine learning algorithms-including Naïve Bayes, Random Forest, Logistic Regression, and Support Vector Machines-were implemented to construct predictive models. Model performance was assessed using standard metrics such as the area under the receiver operating characteristic curve (AUC). To enhance interpretability, SHapley Additive exPlanations (SHAP) were applied to determine feature importance and elucidate model predictions. RESULTS: The most influential predictive features included lymph node parenchymal thickness, lymph node enlargement, and tumor width. Among all models, the Naive Bayes classifier demonstrated the highest performance. In the training cohort, the accuracy, precision, recall, and F1-score were 81.0%, 84.0%, 82.0%, and 82.0%, respectively. In the validation cohort, these values were 82.6%, 83.4%, 82.6%, and 82.0%. The AUCs for the training and validation cohorts were 0.880 and 0.902, respectively. CONCLUSION: The Naïve Bayes model demonstrated robust performance and interpretability in predicting ALNM. As a non-invasive and explainable tool, it provides clinical value for risk stratification, accurate diagnosis, and the development of individualized treatment strategies.