Machine learning and SHAP value interpretation for predicting the response to neoadjuvant chemotherapy and long-term clinical outcomes in Chinese female breast cancer

机器学习和SHAP值解读在预测中国女性乳腺癌新辅助化疗疗效和长期临床结局中的应用

阅读:1

Abstract

BACKGROUND: Most models of neoadjuvant chemotherapy (NACT) for breast cancer (BC) suffer from insufficient data and lack interpretability. Additionally, there is a notable absence of reports from China in this field. This study is also the first to integrate the Advanced Lung Cancer Inflammation Index (ALI) into such a model to evaluate its effectiveness. METHODS: Data from 3,036 female BC patients receiving NACT at Heilongjiang Provincial Tumor Hospital (2008-2019, median follow-up 7.28 years) were analyzed. After screening, 2,909 patients were randomized into training and validation cohorts (7:3). Using eXtreme Gradient Boosting (XGBoost), Gradient Boosting Classifier (GBC), Support Vector Machine (SVM) models, and SHapley Additive exPlanations (SHAP), the best predicting pathological complete response (pCR) model was identified, and key features were interpreted. The Least Absolute Shrinkage and Selection Operator (LASSO) Cox algorithm, combined with XGBoost and Random Forest (RF) models, identified 9 overlapping prognostic features, enhancing the nomogram's predictive accuracy for overall survival (OS). Kaplan-Meier (KM) analysis revealed varying prognostic outcomes. RESULTS: The XGBoost model performed best in predicting pCR, with Area Under Curve (AUC) values of 0.88 and 0.72 in the training and validation sets, respectively. SHAP analysis indicated that ER, HER2 status, ALI, and albumin (Alb) level were the four most important features. The prognostic model was also validated by high AUC values in both training and test sets. KM analysis indicated that lower ALI, non-pCR, and triple-negative BC manifested as worse clinical outcomes. However, the adverse impact of ALI on the prognosis of this cohort was mainly reflected in the long-term recurrence outcomes and non-pCR groups. CONCLUSION: This study is the first to introduce ALI into the prediction model for BC completing NACT and develop a large-sample model based on XGBoost. Owing to the particularity of the indicators, training and validation were conducted on real clinical data.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。