Abstract
BACKGROUND: This study aims to develop and examine the performance of machine learning (ML) algorithms in predicting viral suppression among statewide people living with HIV (PWH) in South Carolina. METHODS: Extracted through the electronic reporting system in South Carolina, the study population was adult PWH who were diagnosed between 2005 and 2021. Viral suppression was defined as viral load <200 copies/mL. The predictors, including sociodemographics, a historical information of viral load indicators (eg, viral rebound), comorbidities, health care utilization, and annual county-level factors (eg, social vulnerability), were measured in each 4-month windows. Using historic information in different lag time windows (1-, 3- or 5-lagged time windows with each 4-month window as a unit), both traditional and ML approaches (eg, Long Short-Term Memory Network) were applied to predict viral suppression. Comparisons of prediction performance between different models were assessed by area under curve (AUC), recall, precision, F1 score, and Youden index. RESULTS: ML approaches outperformed the generalized linear mixed model. In all the 3 lagged analysis of a total of 15,580 PWH, the Long Short-Term Memory Network (Lag 1: AUC = 0.858; Lag 3: AUC = 0.877; Lag 5: AUC = 0.881) algorithm outperformed all the other methods in terms of AUC performance for predicting viral suppression. The top-ranking predictors that were common in different models included historical information of viral suppression, viral rebound, and viral blips in the Lag-1 time window. Inclusion of county-level variables did not improve the model prediction accuracy. CONCLUSIONS: Supervised ML algorithms may offer better performance for risk prediction of viral suppression than traditional statistical methods.