Predicting early and late neonatal mortality using machine learning models in Oman

在阿曼利用机器学习模型预测新生儿早期和晚期死亡率

阅读:1

Abstract

BACKGROUND: Neonatal mortality is a major issue in global health and is included in the Sustainable Development Goals (SDGs). Early neonatal deaths account for 47% of under-five mortality. Developing a dependable model to predict early neonatal mortality and recognise its related risk factors is essential for child survival and enhancing children’s health outcomes. We utilised various machine learning models to predict early and late neonatal mortality using a comprehensive secondary dataset from Oman. METHODS: Ten different machine learning (ML) models were used to predict early and late neonatal mortality in three distinct setups: using the original local dataset, applying the data-driven approach represented by Synthetic Minority Over-Sampling Technique (SMOTE) to address the imbalanced distribution, and implementing an algorithm-driven approach via cost-sensitive classification. A total of 2,940 de-identified local records on newborn deaths were categorised into early deaths (0–6 days) and late deaths (7–27 days) for model training and testing using a 10-fold cross-validation. Various calibration and discrimination metrics were utilised to assess the models’ performance due to the issue of an imbalanced dataset. RESULTS: The analysis revealed that 71.6% of the deaths occurred during the early neonatal period (0–6 days). Logistic Regression (LR), Linear Discriminant Analysis (LDA), and Random Forest (RF) were the top performers across the three scenarios, with AUC-PR (Area Under Precision and Recall Curves) above 0.85 and an exemplary Brier score. However, RF Brier score was more stable across the three setups, especially with SMOTE (Brier = 0.1864), compared to the Brier score of LDA (0.2211) and LR (0.2164) indicating an effective calibration. The APGAR (Appearance, Pulse, Grimace, Activity, and Respiration) score at 5 min was identified as the most significant predictor of early and late neonatal mortality. CONCLUSION: This study is one of the first to train and evaluate multiple ML algorithms under three different scenarios to predict early and late neonatal mortality and to identify associated risk factors using real data from Oman. The results indicate that RF, LDA and LR performed the best based on their discrimination and calibration performance. The findings have the potential to inform clinical decision-making and prompt timely interventions to enhance survival rates. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12889-025-25796-1.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。