Abstract
The Ministry of Health and Social Welfare of South Africa has made significant efforts to combat tuberculosis (TB), guided by the National Strategic Plan for addressing HIV, STIs, and TB. However, progress in preventing and eradicating TB has been seriously hindered by reliance on ineffective diagnostic methods. This study aimed to predict and improve TB diagnosis in South Africa using machine learning techniques. Data from the National Income Dynamics Survey, conducted by the Southern African Labour and Development Research Units, were analyzed. The dataset underwent a 70:30 train-test split for Random Forest (RF), Decision Trees (DTs), Support Vector Machines (SVMs), Gradient Boosting Machines (GBMs), Artificial Neural Networks (ANNs), and Logistic Regression (LR). Hyperparameter tuning and impurity-based measures were employed to rank variable importance. RF achieved 87.50% sensitivity and an F1-score of 92.5%. DT achieved a sensitivity of 90.92% and an F1-score of 93.01%. ANN yielded 81.72% sensitivity and an F1-score of 87.53%. SGBMs showed 91.32% sensitivity and 94.55% F1-score. SVMs showed 90.03% sensitivity and 97.72% F1-score. LR achieved a sensitivity of 96.55% and an F1-score of 96.80%. Machine Learning (ML) techniques, with accuracy rates of more than 80% present a significant opportunity for enhancing TB prediction and diagnosis in South Africa. This predictive technique may be beneficial in resource-constrained settings, including those in sub-Saharan Africa.