Abstract
INTRODUCTION: Anemia is a reduction in the number of hemoglobin levels in the blood that has significant adverse health consequences. It is a public health problem among women’s reproductive age, affecting both poor and rich countries overall the world. That is the study aimed to predict the status and identify the important features of anemia among women in Tanzania and Rwanda. MEHODS: We merged Tanzania and Rwanda data from DHS conducted in 2019 and 2021 and worked with 35,460 women 15 years and above. Feature selection was performed with the help of chi-square, f-classifiers, recursive feature elimination, and mutual information methods. The best performing one was applied to consider 19 features. Four machine learning algorithms— random forest, bagged decision tree, cat boost, and extra tree—were applied to predict and classify the anemia status. RESULT: The chi-square method was utilized most effectively in feature selection, with accuracy of 92.82%. The research revealed a high prevalence of anemia in both countries. Out of the women in Tanzania, 2% had severe anemia, 11% had moderate anemia, and 10% had mild anemia. Among women in Rwanda, 4% had any anemia, including 2% mild anemia in particular. Among the machine learning algorithms attempted, the extra-tree algorithm performed with the best predictive accuracy, which was 91.03%. However, the cat boost model performed with the worst accuracy of 79.21%. CONCLUSION: The extra-tree model has a stronger predictive capability than other machine learning models among the four predictive models constructed using these methods. The extra tree model result indicates that among fertile women, the most significant predictors of anemia were age, wealth index, BMI, education level, television watching, mosquito bed net use, and radio listening. The identification of such high-risk countries could provide useful information to decision-makers trying to reduce anemic women.