Abstract
BACKGROUND: Neglected Tropical Diseases (NTDs) affect 1.5 billion people worldwide with 39% of the global burden occurring in Africa. In Kenya, NTDs remain endemic despite control efforts, with co-endemicity of soil-transmitted helminths (STH), schistosomiasis (SCH), and lymphatic filariasis (LF) complicating intervention strategies. This study developed machine learning models to predict high-risk co-endemic areas using demographic and Water, Sanitation, and Hygiene (WASH) indicators. METHODOLOGY: The study analyzed Kenya's 2022 NTD co-endemicity data from the Expanded Special Project for Elimination of Neglected Tropical Diseases, incorporating WASH and population variables. Three machine learning algorithms, Random Forest, Gradient Boosting Machine, and Extreme Gradient Boosting (XGBoost) were trained to classify regions by STH prevalence levels and co-endemicity status. Model performance was evaluated using cross-validation, Receiver Operating Characteristic - Area under the Curve (AUC) and variable importance analysis. RESULTS: The RF model achieved the highest predictive performance (AUC = 0.70), followed by XGBoost (AUC = 0.66) and GBM (AUC = 0.62). Key predictors included improved sanitation access (mean importance score: 0.24), population density (0.21), and co-endemicity with LF/SCH (0.18). Spatial analysis identified Eastern and North-Eastern Kenya as persistent hotspots, correlating with low WASH coverage (<40%). CONCLUSION: Machine learning models effectively identified the high-risk NTD co-endemic areas in Kenya, with RF outperforming other models. These findings support targeted interventions integrating WASH improvements with mass drug administration in identified hotspots. We propose a real-time dashboard for dynamic risk mapping to optimize resource allocation; a strategy aligned with Kenya's NTD Elimination Strategic Plan 2030.