Evaluating how different balancing data techniques impact on prediction of premature birth using machine learning models

评估不同数据平衡技术对使用机器学习模型预测早产的影响

阅读:3

Abstract

Premature birth can be defined as birth before 37 weeks of gestation, which is a significant global health issue, being the main cause for neonatal deaths. In this work, we evaluate machine learning models for predicting premature birth using Brazilian sociodemographic and obstetric data, focusing on the challenge of data imbalance, a common problem that can lead to biased predictions. We evaluate five data balancing techniques: Undersampling, Oversampling, and three Hybridsampling configurations where the minority class was increased by factors 2, 3, and 4. The machine learning models, including Decision Tree, Random Forest, and AdaBoost, are trained and evaluated on a dataset of over 483,000 cases. The use of the Hybridsampling approach resulted in an accuracy of 70%, a recall of 64%, and a precision of 74% in the Decision Tree model. Results show that Hybridsampling techniques significantly improves models' performance compared to Undersampling and Oversampling, highlighting the importance of a proper data balancing in predictive models for preterm birth. The relevance of our work is particularly significant for the Brazilian Unified Health System (SUS). By improving the accuracy of premature birth predictions, our models could assist healthcare providers in identifying at-risk pregnancies earlier, allowing for timely interventions. This integration could enhance maternal and neonatal care, reduce the incidence of preterm births, and potentially decrease neonatal mortality, especially in underserved regions.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。