Abstract
BACKGROUND: Coinfections of Mycobacterium tuberculosis (MTB) and human immunodeficiency virus (HIV) impose a substantial global health burden. Patients with MTB infection face a heightened risk of progression to incident active TB, which preventive therapy can mitigate. Current testing methods often fail to identify individuals who subsequently develop incident active TB. METHODS: We developed random forest models to predict incident active TB using patients' medical data at HIV-1 diagnosis. Training our model involved using clinical data routinely collected at enrollment from the Swiss HIV Cohort Study (SHCS). This dataset encompassed 55 people with HIV (PWH) who developed incident active TB 6 months after enrollment and 1432 matched PWH without TB enrolled between 2000 and 2023. External validation used data from the Austrian HIV Cohort Study, comprising 43 people with incident active TB and 1005 people without TB. RESULTS: We predicted incident active TB with an area under the receiver operating characteristic curve of 0.83 (95% CI: .8-.86) in the SHCS. After adjusting for ethnicity and the region of origin and refitting the model with fewer parameters, we obtained comparable receiver operating characteristic curve values of 0.72 (SHCS) and 0.67 (Austrian HIV Cohort Study). Our model outperformed the standard of care (tuberculin skin test and interferon-gamma release assay) in identifying high-risk patients, demonstrated by a lower number needed to diagnose (1.96 vs 4). CONCLUSIONS: Models based on machine learning offer considerable promise for improving care for PWH, requiring no additional data collection and incurring minimal additional costs while enhancing the identification of PWH that could benefit from preventive TB treatment.