Abstract
Exposure measurement error is a ubiquitous but often overlooked challenge in causal inference with observational data. Existing methods accounting for exposure measurement error largely rely on restrictive parametric assumptions, while emerging data-adaptive estimation approaches allow for less restrictive assumptions but at the cost of flexibility, as they are typically tailored toward rigidly defined statistical quantities. There remains a critical need for assumption-lean estimation methods that are both flexible and possess desirable theoretical properties across a variety of study designs. In this paper, we introduce a general framework for estimation of causal quantities in the presence of exposure measurement error, adapted from the method of control variates. Our method can be implemented in various two-phase sampling study designs, where one obtains gold-standard exposure measurements for a small subset of the full study sample, called the validation data. The control variates framework leverages both the error-prone and error-free exposure measurements by augmenting an initial consistent estimator from the validation data with a variance reduction term formed from the full data. We show that our method inherits double-robustness properties under standard causal assumptions. Simulation studies show that our approach performs favorably compared to leading methods under various two-phase sampling schemes. We illustrate our method with observational electronic health record data on HIV outcomes from the Vanderbilt Comprehensive Care Clinic.