Abstract
Image data acquisition often involves cross-platform, cross-device, and multi-source heterogeneous data issues, posing challenges for data security and privacy protection in collaborative learning. Traditional centralized learning paradigms struggle to balance multi-institutional collaboration needs with stringent data security requirements, while existing Federated Learning (FL) frameworks frequently exhibit significant performance degradation when handling the complex features inherent in images. To address these gaps, this study introduces FL-LSNet, a novel federated learning framework integrated with a lightweight Large-Small Network (LSNet). Built upon a robust client-server architecture, FL-LSNet safeguards local data privacy through decentralized preprocessing while addressing the challenges of long-tailed data via dynamic weight adjustment mechanisms within the server-side aggregator. The core of the framework, LSNet, implements a "See Large, Focus Small" strategy: (1) Large Kernel Perceptrons (LKP): Capture global contextual dependencies. (2) Small Kernel Attention (SKA): Facilitate fine-grained local feature fusion. Empirical results demonstrate that LSNet reduces computational overhead by 7% compared with Swin Transformer, while enhancing feature representation capability by 19% relative to the baseline model. Extensive evaluations across three diverse datasets reveal that FL-LSNet consistently outperforms state-of-the-art federated algorithms, including FedAvg and MOON, achieving an accuracy range of 84.32% to 98.92%. Ablation studies further validate the efficacy of the FedAvg-LSNet integration, which surpassed the baseline by 6.15%, achieving performance metrics exceeding 98%. This research establishes a scalable paradigm for multi-stakeholder data collaboration and offers new insights into the lightweight vertical adaptation of federated learning in public safety, dynamic monitoring, risk early warning, intelligent agriculture and medical diagnosis.