Abstract
We consider the challenges associated with causal inference in settings where data from a randomized trial are augmented with control data from an external source to improve efficiency in estimating the average treatment effect (ATE). This question is motivated by the SUNFISH trial, which investigated the effect of risdiplam on motor function in patients with spinal muscular atrophy. While the original analysis used only data generated by the trial, we explore an alternative analysis incorporating external controls from the placebo arm of a historical trial. We cast the setting into a formal causal inference framework and show how these designs are characterized by a lack of full randomization to treatment and heightened dependency on modeling. To address this, we outline sufficient causal assumptions about the exchangeability between the internal and external controls to identify the ATE and establish a connection with novel graphical criteria. Furthermore, we propose estimators, review efficiency bounds, develop an approach for efficient doubly robust estimation even when unknown nuisance models are estimated with flexible machine learning methods, suggest model diagnostics, and demonstrate finite-sample performance of the methods through a simulation study. The ideas and methods are illustrated through their application to the SUNFISH trial, where we find that external controls can increase the efficiency of treatment effect estimation.