Abstract
Background: Chronic kidney disease (CKD) is a prevalent complication among individuals with type 2 diabetes (T2D), posing significant diagnostic challenges in resource-limited settings due to infrequent testing and missed hospital visits. This study aimed to develop a simple, effective ML model to identify T2D patients at high risk for reduced kidney function. Methods: We retrospectively analyzed data from 3471 T2D patients collected over a ten-year period at a university hospital in Bangkok, Thailand. Two models were developed using readily available clinical features: one including hemoglobin A1c (HbA1c) levels (the "with-HbA1c" model) and one excluding HbA1c levels (the "non-HbA1c" model). Three tree-based ML algorithms-decision tree, random forest, and extreme gradient boosting (XGBoost) algorithms-were employed. The outcome label was CKD, defined as an estimated Glomerular Filtration Rate (eGFR) < 60 mL/min/1.73 m(2) that persisted for more than 90 days. The model performance was evaluated using the AUROC. The feature importance was assessed using Shapley additive explanations (SHAP). Results: The XGBoost algorithm demonstrated a strong predictive performance. The "with-HbA1c" model achieved an AUROC of 0.824, while the "non-HbA1c" model attained a comparable AUROC of 0.819. Both models were well-calibrated. SHAP analysis identified age, HbA1c, and systolic blood pressure as the most influential predictors. Conclusions: Our simplified, interpretable ML models can effectively stratify the risk of reduced kidney function in patients with T2D using minimal, routine data. These models represent a promising step toward integration into clinical practice, such as through EHR-based alerts or patient-facing mobile applications, to improve early CKD detection, particularly in resource-limited settings.