Abstract
Early diagnosis of cognitive decline is vital for timely treatment of mild cognitive impairment (MCI) and Alzheimer's disease (AD), yet standard clinical assessments often miss subtle longitudinal language changes. We propose a hierarchical hybrid intelligence framework integrating long-context language modeling, temporal progression, semantic graph reasoning, psycholinguistic biomarkers, and contrastive progression learning to classify patient states (Normal, MCI, AD) from longitudinal electronic health record (EHR) notes. The model was trained on 4500 patients and 68,000 clinical notes from Medical Information Mart for Intensive Care III (MIMIC-III) and externally validated on the Medical Information Mart for Intensive Care IV (MIMIC-IV) clinical notes dataset (5200 patients, 72,000 notes). Inputs combined Biomedical and Clinical Bidirectional Encoder Representations from Transformers (BioClinicalBERT) embeddings, Bidirectional Long Short-Term Memory (Bi-LSTM) temporal encodings, Graph Sample and Aggregate (GraphSAGE)-based Unified Medical Language System (UMLS) concept graphs, and psycholinguistic vectors (lexical diversity, grammatical complexity, discourse coherence). On the MIMIC-III hold-out set, the model achieved 99.999% accuracy, a macro F1-score of 0.999, a Receiver Operating Characteristic Area Under the Curve (ROC AUC) of 0.999, and a temporal stability variance of 0.0008. Monte Carlo cross-validation (10,000 folds) yielded 99.997±0.003% accuracy and 0.999±0.001 macro F1. Feature ablation confirmed distinct gains from temporal, semantic, and psycholinguistic modules, improving performance by 1.1% over text-only baselines. Cross-cohort zero-shot testing on MIMIC-IV showed strong generalization with minimal decline in macro F1 and balanced accuracy. Explainability analyses, such as SHapley Additive exPlanations (SHAP) token/concept attribution, attention maps, counterfactual perturbations, and psycholinguistic importance, revealed clinically interpretable markers, such as pronoun overuse, reduced lexical diversity, and syntactic simplification, as predictors of decline. Our framework supports scalable, non-invasive early screening in a variety of healthcare settings by providing longitudinally stable predictions.