Abstract
Integrating structural magnetic resonance imaging (sMRI) with deep learning techniques is one of the important research directions for automated diagnosis of Alzheimer's disease (AD). Among these, Convolutional Neural Networks (CNNs) have been widely adopted as a mainstream approach due to their powerful feature extraction capabilities. However, existing convolutional neural network (CNN)-based voxel models with excellent performance are typically constrained to a single spatial scale. This limitation hinders the effective capture of complex, distributed brain atrophy features of AD and often results in insufficient model interpretability. To address these limitations, we propose BMSSnet, an interpretable AD recognition model based on a multi-scale multi-block attention mechanism. This model adopts a CNN-Transformer hybrid architecture. Specifically, it first captures local anatomical details using a 3D feature extraction network. Subsequently, it utilizes a dual-branch multi-scale attention mechanism to model patches of different sizes, enabling the Transformer to extract global long-range dependencies. Additionally, we devise a lightweight spatial gating unit to facilitate feature spatial interaction while maintaining computational efficiency. For interpretability, the model localizes decision-critical three-dimensional regions of interest (3D ROIs) using attention weights and aligns them with anatomical atlases to verify their pathological relevance. Finally, extensive experiments on the ADNI dataset demonstrate that BMSSnet not only achieves superior diagnostic performance but also accurately localizes AD-associated salient brain regions, offering reliable clinical interpretability.