Abstract
BACKGROUND: Diarrheal disease remains a major cause of under-five mortality in low- and middle-income countries (LMICs). This study investigates the diarrhea determinants, employing a novel dual approach comparing classical epidemiology and machine learning (ML) to determine the most important predictors and to optimize intervention targeting in Nigeria. METHODS: A cross-sectional analysis of 33,924 children aged < 5 years using 2018 Nigeria Demographic and Health Survey (NDHS) was used. Traditional logistic regression was contrasted to ML models (Random Forest, Gradient Boosting Machines, Decision Trees) to consider non-linear associations and variable importance, while using model parameters to assess model performance. RESULTS: The prevalence of diarrhea was 11.98%, with significant disparities between regions. Child's age was the strongest predictor across all models, with a significant odd seen among children aged 6-23 months (AOR = 2.48-2.54). Increased maternal education was protective (AOR = 0.77-0.79), while exposure to the media had multicomponent associations. Urban-rural wealth index was a robust strong socioeconomic predictor. Logistic regression had the best predictive performance (AUC = 0.727), closely followed by that of Gradient Boosting (AUC = 0.718). Sensitivity analysis showed that MICE generated more accurate estimates than complete case analysis because missingness was not random. CONCLUSION: The study stresses the significance of core modifiable determinants such as maternal education and contextual wealth. Methodological triangulation illustrates the complementarity of classic regression for inference and machine learning for feature discovery. These findings justify the imposition of hyper-localized, multi-sectoral interventions on high-risk age groups and areas based on sound data analysis to optimize public health resources. This mixed approach provides a scalable model for disease burden measurement in LMICs.