Abstract
Background/Objective: The integration of machine learning (ML) and deep learning (DL) has significantly enhanced medical image classification, especially in histopathology, by improving diagnostic accuracy and aiding clinical decision making. However, data privacy concerns and restrictions on sharing patient data limit the development of effective DL models. Federated learning (FL) offers a promising solution by enabling collaborative model training across institutions without exposing sensitive data. This systematic review aims to comprehensively evaluate the current state of FL applications in histopathological image classification by identifying prevailing methodologies, datasets, and performance metrics and highlighting existing challenges and future research directions. Methods: Following PRISMA guidelines, 24 studies published between 2020 and 2025 were analyzed. The literature was retrieved from ScienceDirect, IEEE Xplore, MDPI, Springer Nature Link, PubMed, and arXiv. Eligible studies focused on FL-based deep learning models for histopathology image classification with reported performance metrics. Studies unrelated to FL in histopathology or lacking accessible full texts were excluded. Results: The included studies utilized 10 datasets (8 public, 1 private, and 1 unspecified) and reported classification accuracies ranging from 69.37% to 99.72%. FedAvg was the most commonly used aggregation algorithm (14 studies), followed by FedProx, FedDropoutAvg, and custom approaches. Only two studies reported their FL frameworks (Flower and OpenFL). Frequently employed model architectures included VGG, ResNet, DenseNet, and EfficientNet. Performance was typically evaluated using accuracy, precision, recall, and F1-score. Federated learning demonstrates strong potential for privacy-preserving digital pathology applications. However, key challenges remain, including communication overhead, computational demands, and inconsistent reporting standards. Addressing these issues is essential for broader clinical adoption. Conclusions: Future work should prioritize standardized evaluation protocols, efficient aggregation methods, model personalization, robustness, and interpretability, with validation across multi-institutional clinical environments to fully realize the benefits of FL in histopathological image classification.