Abstract
AIMS: This study aims to develop an interpretable machine learning (ML) model for predicting the occurrence of advanced diabetic kidney disease (DKD), with the objective of identifying patients at an early stage of the disease, thereby facilitating timely and appropriate clinical intervention. METHODS: Variable selection was performed using a combination of the least absolute shrinkage and selection operator (LASSO) and recursive feature elimination (RFE) techniques. A prediction model was constructed and validated using eight ML algorithms, and the model's performance was evaluated using area under curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), F1 score, Brier score, calibration curve, and decision curve analysis (DCA). The SHapley Additive exPlanation (SHAP) and partial dependence plot (PDP) methods were employed to interpret the model both locally and globally. Finally, the prediction model was integrated into a network platform based on the Shiny application for direct use by clinicians and patients. RESULTS: Serum creatinine, age, hemoglobin, serum urea, serum ALP, serum UA, platelet count, serum osmolality, serum bicarbonate, and monocyte count were identified as the most important variables in the advanced DKD model. Eight ML models were developed using these five variables. Among them, the logistic regression (LR) model demonstrated accurate predictive ability in both internal and external validation, with AUCs of 0.948 (95%CI: 0.920-0.975) and 0.898 (95%CI: 0.883-0.913), respectively. Furthermore, the LR model exhibited excellent performance in terms of accuracy, sensitivity, PPV, NPV, F1 score, and Brier score. The results of the calibration curve and DCA also indicate a high degree of consistency between the predicted and observed risks of the RF model, with a net return approaching full coverage. The model developed is available through LR-based online calculators for clinicians, free of charge: https://dev2333.shinyapps.io/logistics1/. CONCLUSION: This study developed and validated an interpretable LR model for predicting the occurrence of advanced DKD. The LR model can assist clinical practice by effectively identifying individuals at higher risk of advanced DKD at an early stage, allowing patients to receive timely and personalized treatment, and thereby providing a reliable foundation for improving patient prognosis and optimizing medical resource utilization.