Development and Validation of an Interpretable Machine Learning Prediction Model for Total Pathological Complete Response after Neoadjuvant Chemotherapy in Locally Advanced Breast Cancer: Multicenter Retrospective Analysis

建立和验证可解释的机器学习预测模型,用于预测局部晚期乳腺癌新辅助化疗后病理完全缓解:多中心回顾性分析

阅读:1

Abstract

Objective: This study aims to develop an interpretable machine learning (ML) model to accurately predict the probability of achieving total pathological complete response (tpCR) in patients with locally advanced breast cancer (LABC) following neoadjuvant chemotherapy (NAC). Methods: This multi-center retrospective study included pre-NAC clinical pathology data from 698 LABC patients. Post-operative pathological outcomes divided patients into tpCR and non-tpCR groups. Data from 586 patients at Shanghai Ruijin Hospital were randomly assigned to a training set (80%) and a test set (20%). In comparison, data from our hospital's remaining 112 patients were used for external validation. Variable selection was performed using the Least Absolute Shrinkage and Selection Operator (LASSO) regression analysis. Predictive models were constructed using six ML algorithms: decision trees, K-nearest neighbors (KNN), support vector machine, light gradient boosting machine, and extreme gradient boosting. Model efficacy was assessed through various metrics, including receiver operating characteristic (ROC) curves, precision-recall (PR) curves, confusion matrices, calibration plots, and decision curve analysis (DCA). The best-performing model was selected by comparing the performance of different algorithms. Moreover, variable relevance was ranked using the SHapley Additive exPlanations (SHAP) technique to improve the interpretability of the model and solve the "black box" problem. Results: A total of 191 patients (32.59%) achieved tpCR following NAC. Through LASSO regression analysis, five variables were identified as predictive factors for model construction, including tumor size, Ki-67, molecular subtype, targeted therapy, and chemotherapy regimen. The KNN model outperformed the other five classifier algorithms, achieving area under the curve (AUC) values of 0.847 (95% CI: 0.809-0.883) in the training set, 0.763 (95% CI: 0.670-0.856) in the test set, and 0.665 (95% CI: 0.555-0.776) in the external validation set. DCA demonstrated that the KNN model yielded the highest net advantage through a wide range of threshold probabilities in both the training and test sets. Furthermore, the analysis of the KNN model utilizing SHAP technology demonstrated that targeted therapy is the most crucial factor in predicting tpCR. Conclusion: An ML prediction model using clinical and pathological data collected before NAC was developed and verified. This model accurately predicted the probability of achieving a tpCR in patients with LABC after receiving NAC. SHAP technology enhanced the interpretability of the model and assisted in clinical decision-making and therapy optimization.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。