Abstract
BACKGROUND: Neonatal early-onset sepsis (EOS), occurring within 72 h of birth, is a major cause of neonatal morbidity and mortality worldwide. The urgent need for rapid, accurate diagnosis is underscored by the condition's severity. Current diagnostic methods are hampered by non-specific clinical signs, leading to underdiagnosis or overtreatment. This highlights a crucial gap in neonatal care. METHODS: This retrospective study analyzed data from 1613 full-term pregnant women at a single center in Shenzhen, China (2022), including 69 EOS cases. Ten machine learning algorithms (e.g. Logistic Regression, Random Forest, XGBoost) were developed using maternal prenatal predictors. Data preprocessing involved imputation, standardization, and Lasso feature selection. Models were evaluated using 5-fold cross-validation, and the SMOTE technique was applied to address class imbalance. RESULTS: Among ten machine learning models, XGBoost and Random Forest demonstrated the highest discriminative ability (AUC=0.87). While a default threshold yielded low sensitivity, a threshold optimized for a clinical screening objective (0.04) achieved a sensitivity of 92.8% and a specificity of 73.1%. Key predictors identified included maternal temperature and inflammatory markers. CONCLUSION: This study demonstrates that machine learning models based on maternal factors have the potential to serve as high-sensitivity screening tools for EOS. Tuning the decision threshold is a critical step to maximize clinical utility, which involves a necessary trade-off between sensitivity and specificity. This approach provides a framework for developing and evaluating clinically-oriented prediction models to improve neonatal care.