Abstract
INTRODUCTION: We aimed to develop risk and prognostic prediction tools for early-onset dementia (EOD) using health record data shared across five major international cohorts. METHODS: More than 400,000 dementia-free individuals younger than age 65 at baseline were included. Ensemble learning was used to construct the models. Cumulative incidence and Kaplan-Meier curves were used to visualize risk stratification, and subgroup analyses were conducted to evaluate potential disparities. RESULTS: The CatBoost-based risk model achieved an area under the receiver-operating characteristic curve (AUROC) of 0.814 (<70 years) and 0.892 (<65 years). The Random Survival Forest (RF) prognostic model reached 5-year AUROC of 0.656. Key predictors included age, employment status, and education. DISCUSSION: Based on health record data, this study provides practical and scalable tools for EOD risk screening and prognosis prediction, with potential for implementation in community and primary care settings. HIGHLIGHTS: We developed risk and prognostic prediction models for early-onset dementia (EOD) using indicators shared across five international cohorts. Models showed good discrimination and calibration across internal and external sets, with key predictors including age and work status confirmed by shapley additive explanation (SHAP) analysis. Subgroup analyses supported fairness across sex, age, and comorbidity groups. Our study provides accessible and cost-effective yet effective tools for the screening, prevention, and prognostic prediction of EOD in large community populations and primary care settings.