Abstract
OBJECT: To identify high-risk factors for tuberculosis retreatment and to provide a scientific basis for developing targeted prevention and control strategies by integrating machine learning with latent class analysis. METHODS: This study retrospectively collected baseline and treatment-related data from 6,821 tuberculosis patients, employing machine learning and latent class analysis (LCA) to investigate the key influencing factors associated with high-risk populations for retreatment. RESULTS: The XGBoost model achieved an overall accuracy of 84% and an area under the ROC curve (AUC) of 0.938. The analysis identified sputum examination results at month 6 or 8 of treatment, treatment regimen, and diagnostic classification as the most influential factors associated with retreatment. SHAP analysis further revealed that a sputum examination status of "not performed" was strongly linked to increased retreatment risk. Logistic regression confirmed this finding, with "not performed" (OR = 123.47, P < 0.001) and a "positive" result (OR = 14.89, P = 0.02) at month 6 or 8 identified as significant risk factors. Latent class analysis stratified patients into four distinct subgroups, among which those characterized by comorbid diabetes or prior treatment failure constituted the highest-risk populations for retreatment. CONCLUSION: It is recommended to improve treatment adherence and efficacy monitoring for newly diagnosed patients, strengthen whole-course supervision, and optimize management for elderly patients and those on long-term regimens.