Abstract
BACKGROUND: Malaria remains a serious public health challenge in sub-Saharan Africa, and Nigeria accounts for almost 30% of global child malaria deaths. This study employs machine learning (ML) to improve prediction efficiency and identify the most significant risk factors associated with childhood malaria in high-burden populations. METHODS: A cross-sectional study was conducted among 693 under-5 children from Nigerian Internally Displaced Persons (IDP) camps. Sociodemographic data, household living conditions, and malaria knowledge were collected in addition to Rapid Diagnostic Test (RDT) outcomes. The dataset was split 70:30 to train and evaluate four ML models: Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), and Gradient Boosting Machine (GBM). The performance of the models was measured by AUC, precision, recall, F1-score, and variable importance. RESULTS: Malaria prevalence was 68.5%. Key risk factors were a caregiver with no education (aOR = 3.23, p = 0.026), while female caregivers were significantly associated (aOR = 0.53, p = 0.024). The Random Forest model performed best (AUC = 0.892), where caregiver occupation and residential camp were the most significant predictors. A vast knowledge-practice gap was observed, where 60.3% of caregivers had knowledge of prevention but low bed net usage (2%). CONCLUSION: Random Forest machine learning greatly improves the precision of malaria risk prediction. The results underscore the importance of extensive, modifiable factors such as caregiver occupation and education. Integration of these ML models into surveillance can enable precision public health interventions, including enhanced vector control and focused health education, to combat malaria effectively in high-burden populations. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12879-025-12116-6.