Abstract
Immunization is a cost-effective public health intervention globally, including in Ethiopia. However, the study focused on children aged 0-59 months and analyzed factors influencing incomplete immunization using ensemble machine learning techniques. A total of 16,394 EDHS datasets were used, with 80% for training and 20% for testing sets. Accordingly, the training set consisted of 13,115 samples, while the testing set contained 3,279 samples. Ensemble learning algorithms were employed, including Bagging methods (Bagging meta-estimator, Random Forest), Boosting methods (Gradient Boosting, XGBoost, LightGBM, AdaBoost, and CatBoost), and Voting ensembles combining both bagging and boosting models. Additionally, Stacking was performed using XGBoost and CatBoost as base models, with other machine learning algorithms such as Random Forest, K-Nearest Neighbors (KNN), Artificial Neural Networks (ANN), Support Vector Machines (SVM), and Logistic Regression as meta-models. All models were implemented using the Python programming language. On the tested data, bagging meta-estimator + XGBoost voting model executed the highest performance result of accuracy (95.94%), f1-score (95.89%), recall (94.81%), and precision (97.07%), for visualizing using a confusion matrix and AUC-ROC value of 96%, and the cross-validation score of 95.75% for its reliability. Also, the most influential factors for incomplete immunization include marital status, residence, and others. This study aims to identify key factors influencing immunization coverage among Ethiopian children under the age of five and improve with ensemble machine learning algorithms. The findings provide valuable insights for targeted interventions, supporting improved immunization practices and contributing to better child health outcomes.