Abstract
Accurate detection of Parkinson's disease (PD) from structural MRI remains a significant challenge due to the diffuse and heterogeneous nature of PD-related neuroanatomical alterations. This study introduces HyCoSwin-PD, an advanced hybrid deep learning framework that integrates ConvNeXt-V2 and Swin Transformer to jointly model fine-grained local morphology and hierarchical global context. ConvNeXt-V2 provides strong convolutional inductive biases for capturing subtle structural variations, whereas Swin Transformer contributes multi-scale contextual reasoning through window-based self-attention. A dedicated fusion mechanism unifies these complementary representations into a coherent latent space optimized for PD classification. Evaluated on the PPMI dataset, HyCoSwin-PD achieves 95.8% accuracy, 95.1% sensitivity, and 96.4% specificity, demonstrating superior diagnostic reliability. Ablation analyses further confirm the synergistic value of hybridizing convolutional and transformer-based encoders. Despite these promising outcomes, the reliance on a unimodal MRI dataset and a limited cohort underscores the need for multi-modal and multi-center validation. Overall, HyCoSwin-PD provides a robust, methodologically novel, and clinically relevant framework for MRI-based PD detection.•HyCoSwin-PD introduces a hybrid architecture that integrates ConvNeXt-V2 for local morphological encoding with Swin Transformer for hierarchical global context modeling.•The framework incorporates a feature fusion module that unifies heterogeneous representations to enhance discriminative capacity in MRI-based PD detection.