Abstract
OBJECTIVE: We propose Heterogeneity-aware Collaborative One-shot Lossless Algorithm for Generalized Linear Model (COLA-GLM-H), a novel one-shot lossless distributed algorithm that enables the integration of heterogeneous multi-institutional data while relying solely on instituion-level summary information rather than patient-level data. MATERIALS AND METHODS: Generalized Linear Models (GLMs) are widely used in medical research for analyzing diverse outcome types. In multi-institution settings, we demonstrated that the global likelihood can be reconstructed using only institution-level summary statistics, enabling lossless estimation without accessing individual records. We validated COLA-GLM-H in two real-world studies: (1) an emulated U.S. pediatric centralized network (719,383 patients) evaluating long-term cardiovascular risks following COVID-19, and (2) an internationally decentralized network of 120,429 hospitalized patients from seven databases across three countries assessing risk factors for COVID-19 mortality. RESULTS: In the centralized network, COLA-GLM-H produced estimates identical to those from pooled analyses. In the decentralized setting, the algorithm effectively integrated heterogeneous data across multiple clinical institutions using a single communication round. CONCLUSIONS: COLA-GLM-H provides a lossless, communication-efficient, and computation-efficient solution for multi-institutional research using only institution-level summary data. It accounts for between-institution heterogeneity and supports all outcome types within the exponential family, enabling secure, scalable, and accurate analysis in collaborative clinical research.