Abstract
Mediation analysis helps uncover how exposures impact outcomes through intermediate variables. Traditional mean-based total mediation effect measures can suffer from the cancellation of opposite component-wise effects and existing methods often lack the power to capture weak effects in high-dimensional mediators. Additionally, most existing work has focused on continuous outcomes, with limited attention to binary outcomes, particularly in case-control studies. To fill in this gap, we propose an R2 total mediation effect measure under the liability framework, providing a causal interpretation and applicable to various high-dimensional mediation models. We develop a cross-fitted, modified Haseman-Elston regression-based estimation procedure tailored for mediation analysis in case-control studies, which can also be applied to cohort studies. Our estimator remains consistent in the presence of non-mediators and weak effect sizes in extensive simulations. Theoretical justification on consistency is provided under mild conditions and without requiring exact mediator selection. In a case-control substudy of the Women's Health Initiative involving 2150 individuals, we found that many metabolites were mediators with weak effects in the path from BMI to coronary heart disease, and an estimated 89% (95% CI: 73%-91%) of the variation explained by BMI in the underlying liability of coronary heart disease was estimated to be mediated by the measured metabolomics. The proposed estimation procedure is implemented in R package "r2MedCausal", available on GitHub.