Abstract
OBJECTIVE: To develop and internally validate interpretable machine learning (ML) models for predicting incident cardiovascular disease (CVD) among middle-aged and older Chinese adults across cardiovascular-kidney-metabolic (CKM) stages 0-3, excluding CKM stage 4 (established CVD) by design, and to characterize stage-specific risk patterns using SHapley Additive exPlanations (SHAP). METHODS: Using data from 6049 adults aged ≥45 years without baseline CVD in the China Health and Retirement Longitudinal Study, we developed interpretable ML models to predict incident, self-reported CVD during follow-up across CKM stages 0-3. Feature selection combining the Boruta algorithm with recursive feature elimination identified 11 key predictors. Five ML algorithms-logistic regression, random forest (RF), Extreme Gradient Boosting, Light Gradient Boosting Machine, and multilayer perceptron-were trained and evaluated using a stratified 70/30 train-test split. SHAP analysis enhanced model interpretability and characterized stage-specific risk profiles. RESULTS: During a median 6-year follow-up, 1373 participants (22.7%) developed CVD, with incidence increasing progressively from 11.9% in Stage 0 to 24.4% in Stage 3 (p < 0.001). The RF model showed moderate discriminative ability (area under the receiver operating characteristic curve = 0.704), with balanced sensitivity and specificity, and consistently outperformed other models across CKM stages. Model calibration was evaluated in the independent test set using calibration curves, demonstrating generally consistent agreement between predicted and observed CVD risk across CKM stages, with modest deviations at higher predicted risk levels. SHAP analysis identified age, systolic blood pressure, triglycerides, waist circumference, and C-reactive protein as key contributors to CVD risk prediction, revealing distinct stage-specific importance patterns and non-linear associations. CONCLUSIONS: This interpretable ML framework provides a stage-specific CVD risk stratification approach across the CKM spectrum and may inform future risk assessment research in middle-aged and older Chinese adults. However, external validation and clinical utility evaluation are required before clinical translation.