Abstract
Multi-center collaborations are crucial in developing robust and generalizable machine learning models in medical imaging. Traditional methods, such as centralized data sharing or federated learning (FL), face challenges, including privacy issues, communication burdens, and synchronization complexities. We present CATegorical and PHenotypic Image SyntHetic learnING (CATphishing), an alternative to FL using Latent Diffusion Models (LDM) to generate synthetic multi-contrast three-dimensional magnetic resonance imaging data for downstream tasks, eliminating the need for raw data sharing or iterative inter-site communication. Each institution trains an LDM to capture site-specific data distributions, producing synthetic samples aggregated at a central server. We evaluate CATphishing using data from 2491 patients across seven institutions for isocitrate dehydrogenase mutation classification and three-class tumor-type classification. CATphishing achieves accuracy comparable to centralized training and FL, with synthetic data exhibiting high fidelity. This method addresses privacy, scalability, and communication challenges, offering a promising alternative for collaborative artificial intelligence development in medical imaging.