Abstract
BACKGROUND: Machine learning (ML) models have shown good performance in predicting cardiovascular disease risk. However, the usefulness of ML models has yet to be fully elucidated for sudden cardiac death (SCD) risk using long-term follow-up electronic health records (EHRs). This study aimed to develop and validate ML models for SCD risk prediction among community-dwelling older adults using 6-year primary-care EHRs data. METHODS: Data were obtained from the Jinnan Study, an EHR-based cohort of older adults. A total of 30,535 individuals aged ≥ 65 years from three towns were included. The primary outcome was SCD. Candidate predictors included gender, age, heart rate, body mass index (BMI), waist, systolic blood pressure (SBP), diastolic blood pressure (DBP), physical activity, current smoking status, diabetes, hypertension, fasting blood glucose (FBG), triglyceride (TG), total cholesterol (TC), serum creatinine (Scr), blood urea nitrogen (BUN), QTc prolongation and ST wave abnormality. We employed the data of one town as the training dataset (16,049 individuals), and the data of other 2 towns as the test dataset (14,486 individuals). The two feature selection strategies used were Cox proportional hazards model (Cox model) and random survival forests model (RSF model). Cox model, Fine-gray model, RSF model, lasso Cox regression model and boosted Cox regression model were fitted, optimized through 5-fold cross-validation using the training dataset, and externally validated on the test dataset. Model discrimination and calibration were assessed. RESULTS: With a median follow-up of 5.67 years, 224 SCD events occurred. RSF outperformed other models, with a C-index of 0.820 in the training set and 0.757 in the test dataset (P < 0.001). Calibration plots indicated that most SCD events occurred in individuals within the highest deciles of predicted risk. Summary plots using the "survex" package were generated to enhance interpretability of the RSF model. CONCLUSION: ML models can significantly enhance the prediction of SCD risk using community-based EHRs. Our proposed risk model may enable the identification of high-risk individuals among older adults, facilitating targeted interventions and personalized care strategies. Future research should focus on integrating this model into routine primary care workflows and evaluating its effectiveness in real-world settings.