Abstract
INTRODUCTION: Diagnosis of active Mycobacterium tuberculosis (Mtb) infection relies on clinical symptoms, imaging, and molecular testing, but these methods are often costly and slow. Consequently, there is an urgent need for a rapid and accessible diagnostic approach that can support early detection and reduce ongoing tuberculosis transmission. METHODS: A discovery cohort of 3,829 individuals and an external validation cohort of 405 individuals were included. Six supervised machine learning models were trained using routine laboratory data, and model interpretability was assessed with SHapley Additive exPlanations (SHAP). RESULTS: Among the six models, XGBoost demonstrated the best diagnostic performance in the internal cohort (accuracy 97.49%; sensitivity 97.56%; specificity 97.42%) and maintained strong performance in the external cohort (accuracy 93.67%; sensitivity 91.56%; specificity 91.13%). SHAP analysis indicated that key predictors reflected characteristic host-response patterns, including inflammation-related hypoalbuminemia, lipid metabolism suppression (HDL-C and LDL-C), altered platelet activity (MPV), and lymphocyte reduction (LYM). CONCLUSION: The study presents a high-performing and interpretable machine learning model capable of accurately identifying active Mtb infection using routine blood tests. This low-cost and non-invasive approach has strong potential for application in resource-limited and high-burden settings.