Abstract
Whole-slide image (WSI) analysis requires integrating fine-grained spatial structure with long-range tissue context. This work introduces SlideMamba, a hybrid framework that performs embedding-level fusion of a graph neural network (capturing local topology) and a Mamba state-space branch (modeling global context) via entropy-based confidence weighting. The adaptive fusion emphasizes the branch with lower predictive entropy, providing a principled mechanism to combine complementary feature streams and improving multi-scale representation learning. Effectiveness is demonstrated on two clinically relevant tasks with class imbalance: (i) mutation/fusion prediction from the OAK clinical trial WSIs (40×), where SlideMamba attains PRAUC [Formula: see text], exceeding fixed-fusion (GAT-Mamba [Formula: see text]) and single-branch baselines (Mamba [Formula: see text], SlideGraph+ [Formula: see text], MIL [Formula: see text], TransMIL [Formula: see text]); and (ii) LUAD vs. LUSC classification on an independent proprietary cohort (20×), where SlideMamba achieves PRAUC of [Formula: see text], outperforming MIL (0.946 ± 0.037), TransMIL (0.929 ± 0.033), SlideGraph+ (0.945 ± 0.025), GAT-Mamba (0.935 ± 0.011), Mamba (0.962 ± 0.012). Beyond performance gains, the inclusion of the Mamba backbone ensures computational efficiency by avoiding the quadratic complexity of standard attention mechanisms. Furthermore, the adaptive fusion weights provide inherent interpretability, offering clinicians insight into whether local cellular graphs or global tissue architecture drove the final prediction. These attributes suggest SlideMamba offers a clinically feasible path toward spatially-resolved, precision computational pathology.