Abstract
Accurately forecasting the progression of Parkinson's disease (PD) motor symptoms in early-to-moderate stages is essential for timely intervention and personalized patient care but remains challenging due to heterogeneous and longitudinal symptom evolution. We present a novel dynamic context-aware multi-modal deep learning framework that predicts future motor symptom severity by integrating advanced voice biomarkers with signal processing techniques, clinical progression features, demographic metadata, and semantically enriched patient summary embeddings derived from comprehensive clinical narratives via state-of-the-art natural language processing. Leveraging bidirectional LSTMs augmented with multi-head self-attention, our architecture captures complex temporal dependencies while preventing information leakage. To ensure robust evaluation despite limited sample size (42 patients), we implemented repeated 5-fold cross-validation at the patient level (8 repetitions, 40 total folds), substantially exceeding standard evaluation rigor. Our approach achieves exceptional performance ([Formula: see text] = 0.9925 ± 0.0027, RMSE = 0.67 ± 0.19, MAE = 0.50 ± 0.15) with all 40 folds achieving [Formula: see text] > 0.989, significantly outperforming classical machine learning baselines ([Formula: see text] and 0.002785) and all previously published methods on this dataset. Cross-validated ablation studies (240 total model trainings across 6 configurations) reveal that clinical features establish a strong baseline ([Formula: see text] = 0.9887 ± 0.0043), while text embeddings provide the largest incremental gain (3.82% RMSE reduction). Voice biomarkers contribute modestly to accuracy (2.72%) but substantially enhance stability (10-fold lower variability). The full multi-modal model achieves optimal performance (7.50% RMSE reduction vs. clinical-only) with the lowest variability (CV = 0.27%), demonstrating that dynamic cross-modal fusion enhances both accuracy and robustness. These findings, validated through 40 independent evaluations with each patient tested 8 times, demonstrate that integrating engineered temporal dynamics and contextual embeddings through advanced temporal modeling enables accurate longitudinal predictions of early-to-moderate PD progression. Complete code and implementation details are publicly available to ensure reproducibility.