Abstract
OBJECTIVE: To compare different machine learning models of loss to follow-up among people with HIV (PWH). MATERIALS AND METHODS: Using electronic medical record (EMR) data from 7340 PWH at a federally qualified health center, we developed machine learning models to predict loss to follow-up in HIV care. Unstructured text from clinical notes was analyzed using Bag of Words and Word Embedding natural language processing (NLP) approaches. RESULTS: A random forest model utilizing structured data and Bag of Words (area under the receiver operating curve [AUC], 0.787; 95% CI, 0.776-0.798) outperformed a random forest model utilizing structured data alone (AUC, 0.753; 95% CI, 0.741-0.765), as well as a random forest model using Bag of Words alone (AUC, 0.624; 95% CI, 0.610-0.638). DISCUSSION: A model using both structured EMR data as well as NLP of unstructured clinical notes had higher performance than models using structured EMR data alone or NLP alone in predicting loss to follow-up from HIV care.