Abstract
Hyperglycemia is a major risk factor for chronic kidney disease (CKD). This multicenter prospective study developed and validated a machine learning (ML) model to predict CKD risk in prediabetic and diabetic populations for early intervention, following TRIPOD+AI guidelines. Participants were enrolled from four communities, with three sites providing training (80%) and internal test (20%) datasets, and the fourth for external validation. Five ML algorithms were constructed, and SHapley Additive exPlanations (SHAP) was applied to interpret the optimal model. The XGBoost model showed excellent predictive performance, with AUCs of 0.905, 0.809, and 0.837 in training, internal test, and external validation sets, respectively. Serum creatinine (Scr), age, and hemoglobin (Hb) were the leading predictors, with higher Scr, older age, and lower Hb elevating CKD risk. Risk stratification (low: 0%-5%, medium: 5%-25%, high: 25%-100%) yielded distinct CKD incidences of 0.7%, 9.9%, and 55.5% (p < 0.001). An online prediction tool was further established for community screening. This validated ML model enables accurate risk prediction and stratification in hyperglycemic individuals, providing a feasible approach for early CKD detection and targeted prevention in community healthcare. Trial registration: Not applicable. The study is not a clinical trial.