Abstract
BACKGROUND: Alcohol Use Disorder (AUD) drives significant morbidity through alcohol-related liver disease. Accurate AUD identification in electronic health records is critical for research and care delivery, yet International Classification of Diseases (ICD) code-based algorithms miss many cases while manual review is impractical at scale. Computable phenotypes (CPs) integrating structured and unstructured EHR data offer a scalable solution. METHODS: Using University of Florida Health's Integrated Data Repository covering two million patients, we developed AUD CPs through a two-step process. First, candidate cohorts were identified using AUD-related ICD codes, medications, and keyword searches across structured and unstructured data. Second, rule-based combinations were iteratively refined through manual chart review. Final algorithms were evaluated against gold-standard chart review, measuring sensitivity, positive predictive value (PPV), and F1-score, then validated in an independent testing set and an external dataset. RESULTS: The F1-optimized CP achieved an F1-score of .87 (sensitivity: .98, PPV: .78) in the testing set, while the precision-optimized CP achieved PPV of .9 (sensitivity: .68, F1-score: .77). Minimal performance attenuation between training and testing sets demonstrated robustness and generalizability. Both CPs substantially outperformed restricted AUD-specific ICD code-based approaches. CONCLUSIONS: CPs integrating structured and unstructured EHR data enable accurate, reproducible AUD identification, surpassing traditional AUD-specific ICD-based methods. This approach facilitates efficient cohort construction for clinical research, public health surveillance, and quality improvement initiatives targeting AUD and its consequences, addressing a critical gap in identifying patients who may benefit from screening and intervention.