Abstract
Heterogeneous treatment effect (HTE) refers to the nonrandom, explainable variation in treatment effects for individuals in a population. HTE estimation is central to precision medicine, where accurate effect estimates can inform personalized treatment decisions. In practice, patients can present with covariate profiles that overlap with multiple studies, raising the challenge of optimally informing treatment decisions in a multi-study setting. We proposed a flexible statistical machine learning (ML) framework, the multi-study $ R $-learner, that leverages multiple studies to estimate the HTE. Existing multi-study approaches often assume that study-specific (i) conditional average treatment effect (CATE), (ii) expected potential outcome under no treatment given covariates, and (iii) treatment assignment mechanism are identical across studies, but these assumptions may not hold in practice due to differences in study populations, protocols, or designs. To this end, we developed our framework to directly account for these three types of between-study heterogeneity. It builds upon recent advances in cross-study learning and uses a data-adaptive objective function to combine cross-study estimates of nuisance functions with study-specific CATEs via membership probabilities, which enable information to be borrowed across studies. The multi-study $ R $-learner extends the $ R $-learner to the multi-study setting and is flexible in its ability to incorporate ML techniques. In the series estimation framework, we showed that the proposed method is asymptotically normal and more efficient than the $ R $-learner when there is between-study heterogeneity in the treatment assignment mechanisms. We illustrated using cancer data from randomized controlled trials and observational studies that the multi-study $ R $-learner performs favorably in the presence of between-study heterogeneity.