Abstract
BACKGROUND: Tuberculosis (TB) remains a major global health threat, causing approximately 1.5 million deaths each year. Despite progress in treatment, 15%-20% of patients still experience treatment failure or relapse, highlighting the urgent need for precise predictive tools for early identification of high-risk patients. Current methods based on clinical parameters have limitations in prediction accuracy and revealing potential biological mechanisms. METHODS: This study developed and validated an innovative multi-omics integration prediction model. We retrospectively collected clinical data from 467 tuberculosis patients and integrated transcriptomic data from three independent public cohorts (GSE19491, GSE31312, GSE83456), involving 3,240 differentially expressed genes. Through advanced feature engineering and bioinformatics analysis, key features were selected. We systematically evaluated 12 machine learning algorithms and adopted an ensemble learning strategy to construct the final model. Model performance was evaluated through strict cross-validation and prospective validation cohorts. RESULTS: Clinical data analysis identified age, body mass index (BMI), and C-reactive protein (CRP) levels as significant predictors of treatment response. Transcriptomic analysis revealed 1,247 differentially expressed genes between responders and non-responders, enriched in immune response and metabolic pathways. Among the tested algorithms, the ensemble model based on Extra Trees performed the best, with an area under the curve (AUC) of 0.986, significantly superior to models using only clinical data (AUC = 0.850) or only genomic data (AUC = 0.820). Feature importance analysis confirmed CRP, specific gene features (such as DNA repair and interferon response pathways), age, and BMI as the most important predictors. External validation confirmed the model's robustness (AUC = 0.972). CONCLUSION: This study successfully developed a high-precision prediction model integrating clinical and genomics data, capable of early identification of high-risk patients with poor treatment response. The model demonstrates excellent prediction performance and generalization ability, providing a powerful tool for moving towards tuberculosis precision medicine, guiding individualized treatment strategies to improve patient prognosis and control the spread of drug resistance. CLINICAL TRIAL REGISTRATION: https://www.chictr.org.cn/, ChiCTR2300074328, 03/08/2023.