Abstract
BACKGROUND: Type 2 diabetes mellitus (T2DM) is associated with kidney damage, with microalbuminuria (MAU) serving as an early marker indicating the risk of progression to severe renal and cardiovascular complications, and there is an urgent need for effective prediction tools to identify MAU risk in T2DM patients and prevent adverse outcomes. This study aims to develop a machine learning-based model to enhance the early identification of high-risk individuals and facilitate timely, personalized interventions. METHODS: The electronic medical records of 4170 patients were retrospectively extracted from the diabetes special database of Nanjing Drum Tower Hospital (Ethics approval number: 2021-403-02). The data were divided into training and testing sets (8:2 ratio), and random forest-based recursive feature elimination method was employed to identify the most pertinent input variables for the predictive model. Five machine learning models were applied to predict the progression to MAU. The Shapley additive explanations (SHAP) values were applied for model interpretation to assess feature contributions. Ten features were selected for the construction of a prediction model. RESULTS: For predicting the progression to MAU, the Light GBM model demonstrated the best performance (AUC 0.85, 95% CI 0.82-0.88). By analyzing the Shapley values of the model outputs, we identified the key risk factors for predicting the diagnosis of MAU at both the cohort and individual levels. CONCLUSIONS: This study developed an interpretable machine learning model to predict MAU in T2DM patients, enabling effective risk stratification and identification of high-risk individuals based on baseline data to guide personalized clinical interventions and optimization of treatment.