Abstract
OBJECTIVES: This study aimed to employ machine learning algorithms to predict the factors contributing to zero-dose children in Tanzania, using the most recent nationally representative data. DESIGN: Cross-sectional study. SETTING: This study was conducted in Tanzania and used the most recent 2022 Tanzania Demographic and Health Survey, accessed from http://www.dhsprogram.com. PARTICIPANTS: A total of 2120 children aged 12-23 months were included in this study. OUTCOME MEASURE: Seven classification algorithms were used in this study: logistic regression, decision tree classifier, random forest classifier (RF), support vector machine, K-nearest neighbour, XGBoost (XGB) and Naive Bayes. The dataset was randomly divided into training and testing sets, with 80% allocated for training and 20% for testing. After training the models, the testing data were used to evaluate their performance. This evaluation measured the models' ability to generalise to unseen data using performance metrics such as accuracy, precision, recall, F1-score and AUC. RESULTS: Approximately 7.45% of children (95% CI 6.73%, 8.65%) were categorised as zero-dose children. The RF classifier achieved the highest performance metrics among the evaluated algorithms, with accuracy=0.95, precision=0.94, recall=0.96, F1 score=0.95 and AUC=0.99, making it the most effective supervised machine learning method for predicting zero-dose children in Tanzania. Maternal unemployment had the most significant positive impact (+0.060) on predicting zero-dose children. Lack of maternal education was the second most significant positive factor (+0.048), indicating that mothers without formal education are more likely to have zero-dose children. Small family size was the third most influential factor, with a positive effect (+0.040) on predicting zero-dose children in Tanzania. CONCLUSIONS: The RF classifier emerged as the top-performing model for predicting children in Tanzania who have not received any vaccinations. This comprehensive approach enabled the accurate identification of zero-dose children, highlighting the effectiveness of machine learning in enhancing public health initiatives and optimising vaccination strategies. Using this algorithm can enhance health outcomes and reduce the prevalence of vaccine-preventable diseases in Tanzania.