Abstract
BACKGROUND: Alzheimer's disease (AD) exhibits highly heterogeneous clinical courses. Early, accurate prediction and subgroup identification remain challenging due to reliance on single-modality data and coarse subtype schemes. OBJECTIVE: To develop and validate a multimodal framework that integrates 3D MRI and clinical indicators to (1) stratify patients into clinically meaningful progression subtypes and (2) forecast individual memory/cognitive trajectories at 6, 12, and 48 months. METHODS: Using ADNI-2 (n = 453), we extracted 3D T1-weighted MRI features via a pre-trained Med3D network and combined them with cognitive, functional, and genetic indicators. Non-negative matrix factorization projected patients into a two-dimensional progression space, and K-means defined three prognostic subgroups ("Low," "Mild," "Fast"). We compared several longitudinal architectures (CNN, Transformer, LSTM variants, ConvLSTM); interpretability was assessed with SHAP. RESULTS: Clustering metrics (Silhouette peak at k = 3) supported three distinct trajectories. Stacked LSTM led image-only prediction, while standard LSTM favored indicator-only data. Multimodal LSTM with attention achieved the lowest errors-MAE 0.196, 0.203, and 0.261 at 6, 12, and 48 months-alongside accuracies of 0.903, 0.845, and 0.791. SHAP highlighted memory- and language-related features as dominant contributors. CONCLUSION: An interpretable, fully automated multimodal framework enables robust subgroup stratification and individualized cognitive forecasting up to four years, supporting personalized prognosis and targeted clinical decision-making.