Abstract
Exposome studies involve analyzing numerous exposures with complex interactions and potential collinearity, presenting challenges for conventional statistical methods. While Bayesian kernel machine regression (BKMR) has emerged as a promising solution, its widespread adoption has been hindered by high computational costs and restricted interpretability. To address these critical limitations in large-scale exposome studies, we developed an advanced BKMR (A-BKMR) model. The Gaussian predictive process and matrix decomposition were used to reduce both processing time and memory requirements. Additionally, we employed the parametric g-formula to generate interpretable statistics, including joint and univariate effects as well as bivariate and multivariate interactions. Across various scenarios with different sample sizes and numbers of exposures, A-BKMR demonstrated both high computational efficiency and model performance. Previously, analyzing datasets with sample sizes of 100,000 was unfeasible for traditional BKMR. The current A-BKMR can complete such analyses in 1 h on a personal computer, making it over 700,000 times faster than conventional BKMR implementations. Additionally, A-BKMR can accurately identify important exposure while preserving an area under the curve (AUC) > 0.99 and an R (2) > 0.97 across scenarios with varying sample sizes and numbers of exposures. Furthermore, A-BKMR introduces novel quantitative metrics for effect estimates and interaction analyses, substantially enhancing interpretability. These advancements establish A-BKMR as an excellent statistical framework for future large-scale exposome studies.