Abstract
Colorectal cancer (CRC) is one of the leading causes of cancer-related deaths globally, and early diagnosis is crucial for improving patient outcomes. Despite significant advancements in the field, accurate prediction from medical images remains a challenge due to issues such as weak supervision, data heterogeneity, and large-scale datasets. In this paper, we propose a novel approach for colorectal cancer classification that integrates deep Gaussian processes (DGP) with multi-instance learning (MIL). Our method is designed to handle weakly labeled data, where only bag-level labels are available, and it improves classification performance by utilizing a deep Gaussian process with random feature expansion (DGP-RF). Additionally, our approach incorporates an attention-based aggregation mechanism to emphasize key regions in whole-slide images, enhancing model interpretability and robustness. Experimental results on the TCGA-CRC dataset demonstrate that our model outperforms existing models, achieving an AUC of 0.895, compared to 0.777 for ResNet, 0.791 for EfficientNet, and 0.784 for ShuffleNet. These results highlight the superiority of our approach in terms of both accuracy and robustness. This work offers a promising tool for automated cancer detection, with the potential for clinical deployment.