Abstract
BACKGROUND: Venous thromboembolism (VTE) is a frequent and potentially life-threatening complication in patients with intracerebral hemorrhage (ICH) in intensive care units (ICU). However, the necessity of prophylactic anticoagulation therapy for these patients remains controversial. This study aims to develop an interpretable machine learning (ML) model to accurately predict the risk of VTE in critically ill ICH patients, thereby enabling timely and individualized preventive measures. METHODS: A retrospective analysis was performed on clinical data from the MIMIC-IV database and ICU patients diagnosed with ICH at Qinghai Provincial People's Hospital. After data preprocessing, 1,545 cases from the MIMIC-IV database were randomly divided into a training set (1,097 cases) and a test set (448 cases) in a 7:3 ratio. Data from 151 ICH patients treated in the ICU of Qinghai Provincial People's Hospital between January 2020 and December 2024 were utilized as an external validation set. The Least Absolute Shrinkage and Selection Operator (LASSO) algorithm was applied for feature selection. Model performance was assessed using metrics including the area under the curve (AUC), decision curve analysis (DCA), accuracy, positive predictive value (PPV), and negative predictive value (NPV). The optimal model was further explained using the SHapley Additive exPlanations (SHAP) method. RESULTS: The XGBoost model exhibited the best predictive performance, with AUC values of 0.936, 0.778, and 0.761 for the training set, test set, and external validation set, respectively. Feature importance analysis identified the top 10 influential features as follows: ICU stay duration, age, prothrombin time, triglycerides, albumin, body mass index, partial thromboplastin time, blood glucose, white blood cell count, and systolic blood pressure. CONCLUSION: The XGBoost model accurately predicts VTE occurrence in ICH patients in the ICU. By employing the SHAP method, it is possible to precisely assess the impact of various pathophysiological parameters on individual patient predictions, thereby providing robust support for personalized risk stratification and preventive treatment.