Predicting determinants of unimproved water supply in Ethiopia using machine learning analysis of EDHS-2019 data

利用机器学习分析2019年埃塞俄比亚人口与健康调查(EDHS-2019)数据,预测埃塞俄比亚未改善供水状况的决定因素

阅读:1

Abstract

Over 2 billion people worldwide are impacted by the global dilemma of access to clean and safe drinking water. The problem is most acute in low-income nations, where many people still use unimproved water sources such as exposed wells and surface water. Public health systems are heavily burdened by these sources since they are closely associated with the spread of waterborne illnesses. As a result, there are still many people who suffer from water-related health problems, especially in underdeveloped nations where access to healthcare is limited and sanitation is often inadequate. However, the conventional analytical techniques employed in these investigations frequently fall short of capturing the intricate relationships among many variables, which could restrict the capacity to forecast future patterns. This study aimed to provide more accurate predictions and data-driven insights that can inform policy-making, resource allocation, and interventions to address Ethiopia's water crisis. The Ethiopia Demographic and Health Survey (EDHS-2019), which offers thorough data on socioeconomic, demographic, and water access determinants, was the data source for this study. The following six machine-learning models were used: k-nearest Neighbors, Random Forest, Support Vector Machines, Gradient Boosting Machines, and Artificial Neural Networks. To enhance model performance and prevent overfitting, Hyperparameter adjustment was accomplished via random search and 7-fold cross-validation. The model's performance was evaluated using the standard classification metrics (accuracy, precision, recall, F1-score, and AUC). To examine the significance of features in tree-based models, permutation importance and SHAP values were utilized. In important measures such as AUC (0.8915), F1 Score (0.919), sensitivity (0.879), and specificity (0.967), the Random Forest model fared better than the other models. "Community-level poverty" was the most important predictor, followed by "household wealth index" and "age of household head," according to feature importance analysis. Geographic differences in access to better water sources were found through spatial analysis, with rural areas being the most impacted. Using machine-learning algorithms, specifically Random Forest, has yielded significant insights into the factors influencing Ethiopia's unimproved water supply. The results highlight the necessity of focused interventions in areas with high rates of poverty and insufficient infrastructure. These data-driven insights can help decision-makers better solve Ethiopia's water crisis.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。