Abstract
INTRODUCTION: Deep learning approaches have become central to brain MRI analysis; however, their reliability under dataset shift remains a critical barrier to safe and scalable deployment in neuroscience and clinical research. While convolutional neural networks (CNNs) provide strong locality-driven inductive biases for robust feature extraction, they lack global contextual awareness. Conversely, transformer-based architectures capture long-range dependencies but often exhibit reduced robustness and miscalibrated confidence when applied to heterogeneous medical imaging data, particularly in Cross-Dataset settings. METHODS: In this work, we propose a calibration-aware hierarchical CNN-Transformer fusion framework designed for robust brain MRI analysis under dataset shift. The architecture integrates a pretrained multi-scale CNN backbone with a hierarchical transformer branch and performs scale-aligned fusion through cross-attention mechanisms. By allowing local convolutional features to selectively query global contextual representations, the proposed design maintains stable feature contributions during fusion and mitigates overconfident reliance on transformer features when generalization degrades across datasets. The framework is evaluated using a strict Cross-Dataset protocol, where models are trained on one dataset and tested on a distinct dataset. RESULTS: Experimental results demonstrate that the proposed fusion model achieves competitive classification performance while substantially improving probabilistic calibration relative to both CNN-only and transformer-only baselines. Specifically, the model attains an average accuracy of 99.20% and achieves lower Expected Calibration Error (ECE = 0.0041), Brier score (0.0028), and Negative Log-Likelihood (NLL = 0.0277) compared to a standalone Swin Transformer and a strong ResNet50 baseline. DISCUSSION: These findings demonstrate that calibration-aware hierarchical CNN-Transformer fusion enhances both predictive reliability and robustness under Cross-Dataset evaluation. By improving the alignment between predictive confidence and empirical correctness, the proposed method supports safer large-scale analysis of heterogeneous brain MRI data, with important implications for multi-center neuroscience studies and trustworthy clinical decision support.