Abstract
INTRODUCTION: There is need to detect and intervene in pre-clinical phases of Alzheimer's disease (AD). Electronic health records (EHRs) may help predict AD using machine learning methods. METHODS: We identified EHRs for 19,473 cases with AD and 111,922 controls. Records spanned 10 or more years prior to AD diagnosis. We trained a random forest model (employing 5-fold cross-validation with 2,499 features) to predict AD 10 years prior to its onset using a 75/25% train/test split and then computed permuted feature importance. RESULTS: We achieved an area under the ROC curve of 0.80. Feature importance identified factors associated with AD, including age, sex, race, ethnicity, BMI, cardiovascular diseases, inflammation, pain, sleep and mood disorders, trauma, other neurodegenerative disorders, diuretics, colon-related disorders and procedures, seizures, and vitamin B12. DISCUSSION: This is the first EHR-based model to predict AD 10 years prior to onset, which could help predict AD and inform prevention/early intervention.