Abstract
OBJECTIVE: This study aimed to develop and validate a postoperative venous thromboembolism (VTE) risk prediction model specifically for patients with esophageal cancer, using machine learning techniques to enhance clinical decision-making and patient outcomes. METHODS: Eight machine learning models-Logistic Regression (LR), Random Forest (RF), Neural Network (NN), Gradient Boosting Machine (GBM), Support Vector Machine (SVM), eXtreme Gradient Boosting (XGBoost), AdaBoost, and Decision Tree-were constructed and rigorously evaluated. Model performance was assessed using the Brier score, calibration slope, F1 score, Youden index, the area under the curve (AUC), and accuracy (ACC). Decision curve analysis (DCA) and calibration curves were employed to evaluate clinical utility, while Shapley Additive Explanations (SHAP) plots were used to enhance model interpretability. RESULTS: The study included 1595 patients diagnosed with esophageal cancer, of whom 407 (25.52%) developed VTE during the study period. The models were ranked based on several metrics, with GBM achieving the highest overall score of 44. A Brier score of 0.151, a calibration slope of 1.031, an F1 score of 0.827, a Youden index of 0.466, an AUC of 0.797, and an ACC of 0.757 were obtained. Key predictors included D-dimer and Lymphocyte Count (LYM). SHAP analysis provided valuable insights into the relative contribution of each predictor to the model's risk assessment. CONCLUSION: This study established a robust VTE risk prediction model using GBM, demonstrating high accuracy and clinical applicability for patients with esophageal cancer. The integration of advanced machine learning techniques into clinical practice highlights the potential to reduce VTE-related morbidity and mortality in high-risk patient populations.