Abstract
Soil Transmitted Helminthiases (STH) are among the most common neglected tropical diseases in Nigeria, primarily transmitted through soil contaminated with human feces, which prompted this research on the effect of ecological factors such as soil temperature on the distribution of STH in Nigeria. Environmental factors, particularly soil temperature, play crucial roles in determining STH distribution patterns by influencing helminth egg development and survival rates Given the Global Epidemiological Studies on how STH infection prevalence suggest significantly that land surface temperature, and how the use of Random Forest and particle Swarm Optimization can significantly improve the accuracy of species distribution predictions of STH using Soil temperatures compared to conventional modelling approaches. In this paper, we propose a hybrid model combining the widely used Random Forest Algorithm and Particle Swarm Optimization Algorithm for feature selection and hyperparamter optimization using dataset from ESPEN based on Nigeria Geographical region, and a comprehensive analysis of the STH dataset, a novel hybrid model whose main goal is to produce the accurate decision trees and also determine the best predictor or features, in this approach, an hybrid model was used for feature selection instead of relying on the conventional Random Forest feature selection with the use of random sampling by the training sets, The model's predictive performance was evaluated against traditional Random Forest and Artificial Neural Network algorithms using accuracy metrics. Our model was compared with a deep learning Artificial Neural Network algorithm, RFPSO with 91.40% accuracy, RF with 87% and ANN with 80.97%. Particle Swarm Optimization with Random Forest algorithms integration substantially enhances the accuracy of STH distribution modeling, particularly when incorporating soil temperature data. This hybrid approach offers improved feature selection capabilities and represents a significant advancement over conventional modeling techniques for parasitic disease distribution prediction.