Abstract
The global prevalence of type 2 diabetes mellitus (T2D) has been increasing dramatically as well as diabetic kidney disease (DKD). We aimed to compare the accuracies of 4 machine learning (Mach-L) methods with multiple linear regression (MLR) in predicting future estimated glomerular filtration rate (eGFR) in T2D patients and to rank the importance of DKD risk factors. The study was conducted from 2013 to 2019. Nine hundred and seven T2D patients were followed up for 4 years. Data of potential DKD risk factors were collected and calculated. We used 4 different Mach-L methods to predict the eGFR, including classification and regression tree, random forest, artificial neural network, and eXtreme Gradient Boosting. Simple correlation was applied to overview the relationships between baseline risk factors and eGFR at the end of follow-up (eGFRend). Besides, traditional MLR was used as a benchmark to evaluate if Mach-L methods could outperform MLR. For model interpretability, Shapley additive explanation was applied to explain the contribution of each feature and directions of impacts in the prediction model. In 4 different Mach-L methods, random forest, classification and regression tree, and eXtreme Gradient Boosting were more superior than MLR in the prediction of the eGFRend. The first 6 important risk factors in predicting diabetic eGFRend were body mass index (BMI), baseline high-density lipoprotein cholesterol (HDL-C), baseline urine microalbumin creatinine ratio (MCR), baseline low-density lipoprotein cholesterol (LDL-C), duration of diabetes, and age. By applying Shapley additive explanation, it appeared that age, duration of diabetes, HDL, and LDL were positively related to eGFRend and BMI and MCR were negatively related to eGFRend. Mach-L methods were proved to be more accurate in predicting eGFRend than traditional MLR. BMI presented the most influential factor for eGFRend, followed by HDL-C, baseline urine MCR, LDL-C, duration of diabetes, and age. These findings highlight the potential of Mach-L to enhance early risk stratification for DKD, enabling timely interventions to preserve renal function in T2D patients.