Abstract
BACKGROUND: Tuberculosis (TB) remains a significant public health burden among older adults, yet predictive tools for this population are limited. This study aimed to develop and validate machine learning models to predict TB risk among older adults in Eastern China. METHODS: A prospective cohort of 33,935 participants aged ≥60 years was followed for over 8 years. TB diagnosis was confirmed through linkage with the national TB surveillance system. LassoCox regression was used to identify key predictors of TB risk. Four machine learning models-CoxBoost, Generalized Boosted Models (GBM), LassoCox, and Random Survival Forests (RSF)-were developed and compared. Model performance was evaluated using time-dependent area under the receiver operating characteristic curve (AUC), Brier score, and concordance index. RESULTS: During follow-up, 387 participants developed TB, yielding an incidence rate of 134.5 per 100,000 person-years. The LassoCox model identified 14 predictors, including sex, alcohol consumption, dietary quality, body mass index, and C-reactive protein levels. Among the four models, the LassoCox model demonstrated the best discriminatory ability with an AUC of 0.717 (95% CI: 0.692-0.742), followed by GBM (AUC: 0.712, 95% CI: 0.687-0.737), CoxBoost (AUC: 0.708, 95% CI: 0.683-0.733), and RSF (AUC: 0.637, 95% CI: 0.611-0.663). The LassoCox model also demonstrated satisfactory calibration, with a Brier score of 0.338. Decision curve analysis confirmed clinical utility at threshold probabilities below 20%. Kaplan-Meier survival analysis showed significant differences between risk groups (log-rank P < 0.001), though survival curves revealed limited separation between low- and high-risk groups. CONCLUSION: The LassoCox model demonstrated acceptable predictive performance for TB risk in older Chinese adults. These findings suggest that machine learning-based risk prediction tools could facilitate targeted TB screening by identifying high-risk individuals in aging populations, thereby enabling more efficient allocation of screening resources and earlier intervention. However, further model refinement and external validation in diverse populations are warranted before clinical implementation.