Abstract
PURPOSE: Tobacco use is not commonly represented as computable information in the electronic health record (EHR). We developed an algorithm in the Veterans Health Administration (VHA) to identify tobacco ever-use among Veterans. METHODS: We used the VHA corporate data warehouse to develop an algorithm comprised of multiple data types (health factors [semi-structured template data entry and decision support tools], billing, orders, medication, and encounter codes) to identify tobacco ever-use (current or former) versus never use. Algorithm accuracy was compared to two reference standards: (1) EHR abstraction cohort and (2) Veteran self-reported survey cohort. We calculated the sensitivity and positive predictive values (PPV) for the algorithm and stratified by its data types for the EHR abstraction cohort. We calculated the sensitivity, specificity, PPV, and negative predictive value (NPV) for the algorithm and stratified by its data types for the survey cohort. RESULTS: The algorithm correctly identified 424 of 426 individuals with tobacco ever-use when compared to data abstracted from the EHR: sensitivity 1.00 (95 % CI 0.98-1.00); PPV 1.00 (95 % CI 0.98-1.00). Compared to survey data, the algorithm correctly identified 514 of 547 participants with tobacco ever-use: sensitivity 0.94 (95 % CI 0.92-0.96); PPV 0.88 (95 % CI 0.85-0.91). The specificity was 0.53 (95 % CI 0.45-0.62), and NPV of 0.70 (95 % CI 0.61-0.79). Of all data types, health factors had the highest sensitivity in both cohorts. CONCLUSIONS: This novel tool had excellent sensitivity and PPV for tobacco ever-use in two cohorts. Future research should study this tool to support preventive healthcare services.