Abstract
Distributed statistical modeling is a powerful tool for dealing with large-scale datasets while maintaining data privacy. In this study, we propose a data-driven weighted aggregation procedure that leverages model prediction performance and is adaptable to heterogeneous distributed environments. The proposed procedures utilize the squared prediction error matrix as the main transmitted quantity, with its dimension being the square of the number of workers, ensuring communication efficiency. We show that the proposed estimates have asymptotical optimal weights in terms of quadratic loss and corresponding risk. The limits of data-driven weights are also derived. We also study the minimax property of the proposed nonparametric function estimates. To examine the finite sample performance of the proposed procedure, we conduct Monte Carlo simulation studies. Furthermore, we illustrate the proposed methodology via an empirical analysis of a real-world dataset on heart rate prediction.