Interpretable machine learning for predicting cardiovascular-specific survival in breast cancer patients with second primary cancers: A SEER-based study

利用可解释的机器学习方法预测乳腺癌合并第二原发癌患者的心血管特异性生存率:一项基于SEER的研究

阅读:1

Abstract

OBJECTIVE: Cardiovascular disease constitutes the primary cause of mortality in long-term breast cancer (BC) survivors, yet predictive tools for cardiovascular-specific survival (CSS) in those with a second primary cancer (SPC) remain limited. This study aims to develop a machine learning (ML) model predicting CSS in BC patients with SPC (BC-SPC). METHODS: Patients with BC-SPC diagnosed between 2010 and 2021 were identified from the surveillance, epidemiology, and end results (SEER) database. After screening variables through Least absolute shrinkage and selection operator (LASSO) regression, five predictive models were constructed respectively: extreme gradient boosting (XGBoost), Cox proportional hazards model, random survival forest (RSF), DeepSurv, and support vector machine (SVM). Model performance was assessed using the concordance index (C-index), area under the receiver operating characteristic curve (AUC), calibration curves and decision curve analysis (DCA). Performing SHapley Additive exPlanations (SHAP) analysis and visualization for the optimal model. RESULTS: A total of 22,814 BC-SPC patients were included. Among these, 565 cardiovascular disease-specific deaths occurred, with cumulative incidence rates of 1.29%, 3.06%, and 4.30% at 5, 8, and 10 years, respectively. RSF demonstrated optimal performance, with a C-index of 0.749 in training set and 0.752 in validation set. Time-dependent AUCs at 5, 8, and 10 years were 0.774, 0.761, and 0.766 for the training set, and 0.752, 0.769, and 0.760 for the validation set, respectively. DCA indicated favorable net benefit across relevant thresholds. SHAP analysis revealed that age, radiation, marital status, chemotherapy, surgery, race, and sex are the key drivers in descending order of importance. Based on RSF risk scores, significant differences in CSS were observed among the groups (log-rank p < 0.001). A Shiny-based web tool was developed for personalized prediction. CONCLUSION: The RSF model with SHAP interpretation offers an accurate, user-friendly tool for individualized CSS prediction in BC- SPC and supports precision risk management.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。