Abstract
Alzheimer’s disease (AD), one of the most widespread neurodegenerative disorders, can be mitigated through early recognition and treatment. Recent research has shown that multimodal fusion is effective for the early-stage diagnosis of AD. However, most existing methods do not adequately account for the differences in data modality domains, their interconnections, and relative relevance.In this paper, we introduce a robust Intra-scale Interaction and Cross-scale Fusion Network (ISI-CSFN) for AD progression detection. The proposed model employs a linearized convolutional attention module to enable interaction between global information captured by the Cascaded Transformer (CTransformer) and local features extracted by the Depthwise Separable Convolution Network (DSCN). This mechanism enhances the discriminative ability for AD progression detection by allowing each modality-specific branch to incorporate complementary contextual representations from the others while maintaining the integrity of its own features.Furthermore, the model integrates background (BG) information with multimodal temporal data for simultaneous prediction of several cognitive score variables. The proposed method achieves strong results for both regression and multi-class progression tasks. Specifically, the accuracy of our approach is 97.26% for NC vs. AD, 89.25% for NC vs. sMCI, and 84.74% for NC vs. pMCI classifications. In addition, the model attains the highest correlation coefficient and the lowest root mean square error (RMSE) across several clinical score regression tasks.