Abstract
OBJECTIVE: While timely interventions can improve medication adherence, it is challenging to identify which patients are at risk of nonadherence at point-of-care. We aim to develop and validate flexible machine learning (ML) models to predict a continuous measure of adherence to guideline-directed medication therapies (GDMTs) for heart failure (HF). MATERIALS AND METHODS: We utilized a large electronic health record (EHR) cohort of 34,697 HF patients seen at NYU Langone Health with an active prescription for ≥1 GDMT between April 01, 2021 and October 31, 2022. The outcome was adherence to GDMT measured as proportion of days covered (PDC) at 6 months following a clinical encounter. Over 120 predictors included patient-, therapy-, healthcare-, and neighborhood-level factors guided by the World Health Organization's model of barriers to adherence. We compared performance of several ML models and their ensemble (superlearner) for predicting PDC with traditional regression model (OLS) using mean absolute error (MAE) averaged across 10-fold cross-validation, % increase in MAE relative to superlearner, and predictive-difference across deciles of predicted PDC. RESULTS: Superlearner, a flexible nonparametric prediction approach, demonstrated superior prediction performance. Superlearner and quantile random forest had the lowest MAE (mean [95% CI] = 18.9% [18.7%-19.1%] for both), followed by MAEs for quantile neural network (19.5% [19.3%-19.7%]) and kernel support vector regression (19.8% [19.6%-20.0%]). Gradient boosted trees and OLS were the 2 worst performing models with 17% and 14% higher MAEs, respectively, relative to superlearner. Superlearner demonstrated improved predictive difference. CONCLUSION: This development phase study suggests potential of linked EHR-pharmacy data and ML to identify HF patients who will benefit from medication adherence interventions. DISCUSSION: Fairness evaluation and external validation are needed prior to clinical integration.