Abstract
Data heterogeneity critically limits distributed artificial intelligence (AI) in medical imaging. We propose HeteroSync Learning (HSL), a privacy-preserving framework that addresses heterogeneity through: (1) Shared Anchor Task (SAT) for cross-node representation alignment, and (2) an Auxiliary Learning Architecture coordinating SAT with local primary tasks. Validated via large-scale simulations (feature/label/quantity/combined heterogeneity) and a real-world multi-center thyroid cancer study, HSL outperforms local learning, 12 benchmark methods (FedAvg, FedProx, SplitAVG, FedRCL, FedCOME, etc.), and foundation models (e.g., CLIP) by better stability and up to 40% in area under the curve (AUC), matching central learning performance. HSL achieves 0.846 AUC on the out-of-distribution pediatric thyroid cancer data (outperforming others by 5.1-28.2%), demonstrating superior generalization. Visualizations confirm HSL successfully homogenizes heterogeneous distributions. This work provides an effective solution for distributed medical AI, enabling equitable collaboration across institutions and advancing healthcare AI democratization.