Abstract
Parkinson's disease (PD) is a progressive neurodegenerative disorder characterized by motor and non-motor impairments, where early diagnosis remains challenging due to reliance on subjective clinical assessments. Recent artificial intelligence (AI)-based approaches have demonstrated promise in identifying subtle PD biomarkers from individual modalities such as speech, gait, and handwriting; however, unimodal systems often fail to capture the heterogeneity of the disease and provide limited interpretability. To address these limitations, this study proposes a multimodal deep learning framework that integrates handwriting, gait, and speech modalities using an early feature fusion strategy for robust and interpretable PD detection. Each modality is processed through a dedicated feature extraction pipeline using deep neural networks, followed by static feature concatenation and classification using an XGBoost model. Model transparency is enhanced using explainable AI (XAI) techniques, including SHapley Additive exPlanations (SHAP) and Gradient-weighted Class Activation Mapping (Grad-CAM), enabling clinical interpretability of modality- and feature-level contributions. Experimental evaluation on benchmark datasets demonstrates that the proposed trimodal fusion model achieves an accuracy of 92%, outperforming unimodal handwriting (91%), gait (90%), and speech (74%) models. The fusion framework attains a macro F1-score of 0.89, an area under the ROC curve (AUC) of 0.95, and an average precision (AP) of 0.96, indicating strong discriminative capability and robustness. Confusion matrix analysis reveals balanced sensitivity (90%) and specificity (89%) across classes. Explainability analysis confirms that handwriting tremor patterns, gait force asymmetries, and speech spectral instabilities are key contributors to PD prediction. These results highlight the effectiveness of explainable multimodal AI in delivering accurate, reliable, and clinically interpretable solutions for early PD detection.