Abstract
BACKGROUND: Timely identification of pediatric sepsis remains a critical challenge in emergency and intensive care settings due to the heterogeneous clinical presentations across age groups. Existing scoring systems often lack temporal resolution and interpretability. We aimed to develop a real-time, machine learning-based prediction framework integrating static and dynamic electronic health record (EHR) features to support early sepsis detection. METHODS: This retrospective study included pediatric patients from Guangzhou Women and Children's Medical Center (GWCMC; n = 1,697) and an external validation cohort from the MIMIC-III database (n = 827). Irregular time-series data were imputed using a correlation-enhanced continuous time-window histogram with multivariate Gaussian processes (CTWH + MGP). We compared the predictive performance of XGBoost and gated recurrent unit (GRU)-based RNN models over a 12-h window prior to clinical diagnosis. Model outputs were validated internally and externally using AUROC, AUPRC, and Youden index, with SHAP-based interpretability applied to identify key clinical features. RESULTS: The CTWH + MGP-XGBoost model achieved the highest AUROC at diagnosis time (T = 0 h; AUROC = 0.915), while the GRU-based model demonstrated superior temporal stability across early windows. Top contributing features included lactate, white blood cell count, pH, and vasopressor use. External validation confirmed generalizability (MIMIC-III AUROC = 0.905). Simulation of real-time alerts showed a median lead time of 6.2 h before clinical diagnosis, with κ = 0.82 agreement against physician-confirmed cases. CONCLUSIONS: Our results suggest that a dual-model ensemble combining interpolation-based preprocessing and interpretable machine learning enables robust early sepsis detection in pediatric populations. The system supports integration into EHR platforms for real-time clinical alerts and may inform prospective trials and quality improvement initiatives.