Abstract
BACKGROUND: Whether survival at extreme ages can be accurately predicted remains unclear. This study explored the feasibility of using machine learning (ML) and electronic health records (EHRs) to predict mortality in centenarians and identify key survival determinants. METHODS: We analyzed 9718 centenarians (83% women) from the population-based EHR database in Hong Kong (2004-2018). Data were randomly split into 70% training and 30% testing cohorts. Using 82 predictors, including demographics, diagnoses, prescriptions, and laboratory results, we trained stepwise logistic regression and four ML algorithms to predict 1-year, 2-year, and 5-year all-cause mortality after age 100. Model performance was evaluated using discrimination (area under the receiver operating characteristic curve [AUROC]) and calibration metrics. In an independent cohort of 174 606 oldest-old adults aged 85-105 years, we further compared AUROCs of models incorporating the identified predictors versus comorbidity and frailty scores across different age groups. RESULTS: Among the ML models, eXtreme Gradient Boosting algorithm provided the best performance, with AUROCs of 0.707 (95% CI = 0.685-0.730) for 1-year mortality and 0.704 (0.686-0.723) for 2-year mortality in the testing cohort. However, all models showed poor calibration for 5-year mortality. Top three predictors of mortality included lower albumin levels, more frequent hospitalizations, and higher urea levels. Models including these predictors consistently outperformed comorbidity and frailty for mortality prediction among oldest-old adults. CONCLUSIONS: Utilizing ML models and routinely collected EHRs can predict short-term survival in centenarians with moderate accuracy. Further research is needed to determine whether mortality predictors differ across age in the oldest-old population.