Abstract
BACKGROUND: Early identification of high-risk patients is crucial for improving outcomes. This study aims to develop and validate a machine learning (ML) model to predict early 7-day mortality in sepsis patients based on routine clinical data obtained immediately after diagnosis. METHODS: Data were collected from four tertiary hospitals across diverse regions in China. Seven ML algorithms were employed to construct the prediction model. Model performance was evaluated using Area Under the Receiver Operating Curve (AUROC), calibration curves, Decision Curve Analysis (DCA), and clinical application. The SHapley Additive exPlanations (SHAP) method was used to interpret the model and identify key predictors. RESULTS: Among 8729 patients, 752 (8.6%) died within 7 days after admission. The Artificial Neural Network (ANN) model demonstrated superior predictive performance, achieving an AUROC of 0.767 (95% CI: 0.748-0.787) in training set, outperforming traditional scoring systems such as APACHE II (AUROC: 0.710, 95% CI: 0.698-0.721) and SOFA (AUROC: 0.718, 95% CI: 0.707-0.729). This performance was consistent in the test set. Key predictors of early mortality included Glasgow Coma Scale (GCS), blood chloride, and albumin levels. The SHAP analysis provided interpretable insights into the model. CONCLUSION: We developed a machine learning model to predict the risk of early 7-day mortality in sepsis patients based on routine clinical data obtained immediately after diagnosis and validated its potential as a clinically reliable tool, achieving an AUROC of 0.767 in the training set. The use of SHAP-based interpretation enhances model interpretability, enabling clinicians to better understand the factors influencing mortality, identify high-risk patients early, and implement timely interventions to improve outcomes.