Abstract
BACKGROUND: Effective antiretroviral therapy to maintain durable viral suppression is key to ending the HIV epidemic in the United States. We evaluated the ability of machine learning algorithms to predict people with HIV (PWH) at risk of unsuppressed viral load. SETTING: Retrospective study among PWH from San Diego County (n = 18,916). The study used reported public health HIV data (2017-2022) to predict the outcome of HIV viral load >200 copies/mL during a year-long prediction window. METHODS: The data was partitioned by calendar date into two training and one validation datasets to accurately assess performance for predicting future observations. A random forest model was used to generate outcome predictions for the overall population and stratified by race. Mediation analysis was undertaken to assess underlying causality. RESULTS: The model had an area under the receiver operating characteristic curve of 82.2 (95% CI: 79.3 to 85.0), a sensitivity of 33.8% (95% CI: 28.6 to 39.0), and specificity of 96.9% (95% CI: 95.7 to 97.2) corresponding to a positive predictive value of 55.7% (95% CI: 48.7 to 62.8) and negative predictive value of 91.7% (95% CI: 90.6 to 92.8). The area under the receiver operating characteristic was similar across races. Prior viral load characteristics were identified as the most important variables; however, they partially acted as mediators of underlying demographic (eg, race) and HIV infection risk (eg, injection drug use). CONCLUSIONS: Machine learning algorithms using mandatory reported public health HIV data can predict which PWH will have future unsuppressed viral load. Future work will assess its clinical utility compared to existing data-to-care initiatives.