Abstract
Understanding the molecular mechanisms mediating the causal effects of epidemiological risk factors on complex traits can advance targeted disease interventions. Statistical mediation analysis facilitates this by disentangling direct and indirect causal effects. Current approaches to causal mediation leverage Mendelian randomization, using summary statistics from the exposure, mediator, and outcome studies that estimate the genetic effects of instruments. However, differences in study sample sizes (measurement errors) lead to substantial biases and poorly controlled type I error rates for these methods, which become especially pronounced when simultaneously estimating the mediation proportion of numerous mediators. To address these limitations, we introduce Likelihood-based Mediation Analysis (LiMA), which estimates molecular mediation more accurately and robustly by jointly modeling the variability in all estimates involved. Through extensive simulation studies and benchmarking, we demonstrate that our approach achieves several-fold lower bias and improved control for type I error than state-of-the-art methods. Applying our method to real data highlighted several plausible metabolites-such as glutamate and carnitine-as well as proteins mediating the causal effects of obesity-related risk factors on cardiometabolic outcomes. These findings underscore the potential of our framework to reveal promising molecular pathways underlying complex diseases. By accommodating the variability inherent to summary statistics of varying precision, LiMA enables robust mediation analyses across large sets of mediators.