Abstract
INTRODUCTION: Sepsis-associated acute kidney injury (S-AKI) is a severe complication in critically ill patients, linked to increased short-term mortality and chronic kidney disease. Existing prognostic tools like SOFA and SAPS II lack full representation of intricate clinical variable interactions. Machine learning (ML) models have potential in intensive care, but few validated and interpretable models focus on the in-hospital mortality rate of patients with S-AKI. This study aims to create and validate ML models for forecasting in-hospital mortality in S-AKI patients, identifying the most effective predictive model. METHODS: We conducted a retrospective analysis of data from the MIMIC-IV 3.0 database to identify adult ICU patients who met theSepsis-3.0(Sepsis-3 was defined as suspected infection with an acute increase in SOFA score ≥2) and KDIGO criteria for S-AKI. Additionally, a prospective cohort study from the General Hospital of Ningxia Medical University spanning 2023 to 2025 was included. Predictors recorded within 24 h of ICU admission included demographic information, comorbidities, vital signs, laboratory results, treatments, and severity scores. Variables with more than 20% missing data were excluded, and the remaining data were processed using interpolation. Feature selection was performed using the Boruta algorithm, and five machine learning models were trained (XGBoost, Random Forest, LightGBM, Decision tree, logistic regression). Model performance evaluation was based on metrics such as AUC, accuracy, sensitivity, specificity, F1 score, and clinical efficacy assessed through decision curve analysis. To enhance model interpretability, the SHapley Additive exPlanations (SHAP) method was employed. RESULTS: Among 16,800 patients with S-AKI, non-survivors(in-hospital mortality) exhibited older age, higher disease severity scores, more pronounced fluid overload, poorer renal function, metabolic acidosis, coagulation disorders, and heightened inflammatory responses. The XGBoost model demonstrated superior discriminative power (AUC 0.8799 ) in internal validation, surpassing other ML models, with exceptional sensitivity, accuracy, and F1 score. Decision curve analysis revealed that LightGBM offered the most significant net clinical benefit across various threshold probabilities. SHAP analysis consistently identified SAPS II score, AKI stage, oxygenation index, and key biochemical markers (e.g., serum sodium and blood urea nitrogen) as primary contributors to mortality risk, while the added value of basic demographic variables was limited. External validation confirmed that the XGBoost model has potential discrimination and robustness, highlighting the robustness and wide applicability of the machine learning-based prognostic framework. CONCLUSION: This study established an externally validated and interpretable ML model for riskstratification in S-AKI, enabling early identification of high-risk patients, personalized management strategies, and enhanced clinical outcomes in sepsis care.