Abstract
BACKGROUND: Neonatal mortality is a major issue in global health and is included in the Sustainable Development Goals (SDGs). Early neonatal deaths account for 47% of under-five mortality. Developing a dependable model to predict early neonatal mortality and recognise its related risk factors is essential for child survival and enhancing children’s health outcomes. We utilised various machine learning models to predict early and late neonatal mortality using a comprehensive secondary dataset from Oman. METHODS: Ten different machine learning (ML) models were used to predict early and late neonatal mortality in three distinct setups: using the original local dataset, applying the data-driven approach represented by Synthetic Minority Over-Sampling Technique (SMOTE) to address the imbalanced distribution, and implementing an algorithm-driven approach via cost-sensitive classification. A total of 2,940 de-identified local records on newborn deaths were categorised into early deaths (0–6 days) and late deaths (7–27 days) for model training and testing using a 10-fold cross-validation. Various calibration and discrimination metrics were utilised to assess the models’ performance due to the issue of an imbalanced dataset. RESULTS: The analysis revealed that 71.6% of the deaths occurred during the early neonatal period (0–6 days). Logistic Regression (LR), Linear Discriminant Analysis (LDA), and Random Forest (RF) were the top performers across the three scenarios, with AUC-PR (Area Under Precision and Recall Curves) above 0.85 and an exemplary Brier score. However, RF Brier score was more stable across the three setups, especially with SMOTE (Brier = 0.1864), compared to the Brier score of LDA (0.2211) and LR (0.2164) indicating an effective calibration. The APGAR (Appearance, Pulse, Grimace, Activity, and Respiration) score at 5 min was identified as the most significant predictor of early and late neonatal mortality. CONCLUSION: This study is one of the first to train and evaluate multiple ML algorithms under three different scenarios to predict early and late neonatal mortality and to identify associated risk factors using real data from Oman. The results indicate that RF, LDA and LR performed the best based on their discrimination and calibration performance. The findings have the potential to inform clinical decision-making and prompt timely interventions to enhance survival rates. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12889-025-25796-1.