Novel models by machine learning to predict the risk of cardiac disease-specific death in young patients with breast cancer

利用机器学习建立预测年轻乳腺癌患者心脏病特异性死亡风险的新模型

阅读:1

Abstract

BACKGROUND: With the tremendous leap of various adjuvant therapies, breast cancer (BC)-related deaths have decreased significantly. Increasing attention was focused on the effect of cardiac disease on BC survivors, while limited existing population-based studies lay emphasis on the young age population. METHOD: Data of BC patients aged less than 50 years was collected from the SEER database. A competing risk model was introduced to analyze the effects of clinicopathology variables on the cardiac disease-specific death (CDSD) risks of these patients. Further, an XGBoost prediction model was constructed to predict the risk of CDSD. Prediction performance was assessed using the receiver operating characteristic (ROC) analysis, area under the POC curve (AUC) values, calibration curves, decision curves, and confusion matrix, and SHapley Additive exPlanations (SHAP) were used to interpret the models. RESULTS: Our competing risk analysis proved that young BC patients with older age, low household income, non-metropolitan residential environment, black race, unmarried status, HR + subtype, higher T stage (T2-4), receiving chemotherapy, and non-surgery are under higher risk of CDSD. Further, five machine learning models were constructed to predict the CDSD risks of young BC patients, among which the XGBoost models showed the highest AUC value (train set: AUC = 0.846; test set: AUC = 0.836). The confusion matrix of the XGBoost model demonstrated that the sensitivity, specificity, and correction were 0.81, 0.94, and 0.94 for the train set, and 0.82, 0.95, and 0.96 for the test set, respectively. The SHAP graph indicated that median household income, marital status, race, and age at diagnosis were the top four strongest predictors. CONCLUSION: Independent CDSD risk factors for young BC patients were identified, and machine-learning prognostic models were constructed to predict their CDSD risks. Our validation results indicated that the predicted probability of our XGBoost model agrees well with the actual CDSD risks, and it can help recognize high-risk populations and therefore develop effective cardioprotection strategies. Hopefully, our findings can support the growth of the new field of cardio-oncology.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。