Abstract
Histopathological image classification using computational methods such as fine-tuned convolutional neural networks (CNNs) has gained significant attention in recent years. Graph neural networks (GNNs) have also emerged as strong alternatives, often employing CNNs or vision transformers (ViTs) as node feature extractors. However, as these models are usually pre-trained on small-scale natural image datasets, their performance in histopathology tasks can be limited. The introduction of foundation models trained on large-scale histopathological data now enables more effective feature extraction for GNNs. In this work, we integrate recently developed foundation models as feature extractors within a lightweight GNN and compare their performance with standard fine-tuned CNN and ViT models. Furthermore, we explore a prediction fusion approach that combines the outputs of the best-performing GNN and fine-tuned model to evaluate the benefits of complementary representations. Results demonstrate that GNNs utilizing foundation model features outperform those trained with CNN or ViT features and achieve performance comparable to standard fine-tuned CNN and ViT models. The highest overall performance is obtained with the proposed prediction fusion strategy. Evaluated on three publicly available datasets, the best fusion achieved F1-scores of 98.04%, 96.51%, and 98.28%, and balanced accuracies of 98.03%, 96.50%, and 97.50% on PanNuke, BACH, and BreakHis, respectively.