Abstract
Acute kidney injury (AKI) is a serious complication in preterm infants admitted to neonatal intensive care units (NICUs), contributing to high mortality and long-term morbidity. We conducted a retrospective cohort study including 2,473 preterm infants from the MIMIC-III database. An extreme gradient boosting (XGBoost) model was developed and compared with six other machine learning algorithms as well as the Score for Neonatal Acute Physiology II (SNAP-II). Feature selection was conducted using the Boruta algorithm. Model performance was assessed using the area under the precision-recall curve (AUC-PR), calibration plots, and decision curve analysis. SHapley Additive exPlanations (SHAP) were applied to elucidate feature importance and interactions. The XGBoost model achieved superior discrimination (AUC: 0.922 and AUC-PR: 0.618). Sensitivity analyses using multiple imputation and class weighting affirmed robustness. SHAP analysis revealed SNAP-II and first-day urine output as the most influential predictors. Interaction analysis revealed that higher SNAP-II levels combined with lower birth weight and lower urine output combined with positive fluid balance synergistically increased the risk of AKI. We identified two actionable thresholds to guide clinical use: 0.146 (Youden Index) for enhanced monitoring and 0.88 (positive predictive value ≥ 80%) for prompt nephrology consultation. We developed and internally validated an interpretable XGBoost-based model that predicts AKI in preterm infants within 7 days of NICU admission. Furthermore, it represents the first application of such a model for risk stratification and the development of a lightweight calculation tool (Shiny) to facilitate early kidney-protective interventions and risk management in this vulnerable population.