Abstract
OBJECTIVE: Cardiovascular disease constitutes the primary cause of mortality in long-term breast cancer (BC) survivors, yet predictive tools for cardiovascular-specific survival (CSS) in those with a second primary cancer (SPC) remain limited. This study aims to develop a machine learning (ML) model predicting CSS in BC patients with SPC (BC-SPC). METHODS: Patients with BC-SPC diagnosed between 2010 and 2021 were identified from the surveillance, epidemiology, and end results (SEER) database. After screening variables through Least absolute shrinkage and selection operator (LASSO) regression, five predictive models were constructed respectively: extreme gradient boosting (XGBoost), Cox proportional hazards model, random survival forest (RSF), DeepSurv, and support vector machine (SVM). Model performance was assessed using the concordance index (C-index), area under the receiver operating characteristic curve (AUC), calibration curves and decision curve analysis (DCA). Performing SHapley Additive exPlanations (SHAP) analysis and visualization for the optimal model. RESULTS: A total of 22,814 BC-SPC patients were included. Among these, 565 cardiovascular disease-specific deaths occurred, with cumulative incidence rates of 1.29%, 3.06%, and 4.30% at 5, 8, and 10 years, respectively. RSF demonstrated optimal performance, with a C-index of 0.749 in training set and 0.752 in validation set. Time-dependent AUCs at 5, 8, and 10 years were 0.774, 0.761, and 0.766 for the training set, and 0.752, 0.769, and 0.760 for the validation set, respectively. DCA indicated favorable net benefit across relevant thresholds. SHAP analysis revealed that age, radiation, marital status, chemotherapy, surgery, race, and sex are the key drivers in descending order of importance. Based on RSF risk scores, significant differences in CSS were observed among the groups (log-rank p < 0.001). A Shiny-based web tool was developed for personalized prediction. CONCLUSION: The RSF model with SHAP interpretation offers an accurate, user-friendly tool for individualized CSS prediction in BC- SPC and supports precision risk management.