Abstract
BACKGROUND: To compare three machine learning algorithms for constructing a hypoglycemia risk prediction model in hospitalized type 2 diabetes patients, identify the optimal model, and validate it to provide decision-making support for early clinical identification of high-risk patients. METHODS: A case-control study design was adopted, retrospectively collecting clinical data from 1,167 hospitalized type 2 diabetes patients in the endocrinology department of a tertiary hospital from January to December 2024. Patients were divided into a hypoglycemia group (220 cases) and a non-hypoglycemia group (947 cases). After screening predictive variables using LASSO regression, the data were randomly split into a training set (934 cases) and a validation set (233 cases) at an 8:2. The training set was used to construct prediction models using Logistic Regression, Random Forest (RF), and Extreme Gradient Boosting (XGBoost) algorithms, with internal validation performed on the validation set to assess predictive performance. The optimal model was determined by comprehensively evaluating the Area Under the ROC Curve (AUC) and F1 score. The SHAP (Shapley Additive Explanations) method was applied for interpretability analysis. RESULTS: The incidence of hypoglycemia was 18.85% (220/1,167). LASSO regression identified nine key predictive variables: random C-peptide, insulin-containing fluid infusion, BMI, length of hospital stay, age, renal dysfunction, albumin level, lipohypertrophy, and insulin antibodies, all of which were statistically significant (P < 0.05). Validation results showed that the XGBoost model exhibited the best predictive performance in both the training set (AUC = 0.853) and the validation set (AUC = 0.910), outperforming the other models significantly. SHAP analysis revealed the contribution of each feature to the prediction. CONCLUSION: The prediction model developed with the XGBoost algorithm demonstrated superior discriminative performance, providing a reliable tool for clinical identification of high-risk hypoglycemia in hospitalized type 2 diabetes patients. CLINICAL TRIAL NUMBER: Not applicable. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12902-025-02104-x.