The predictive model and risk factor identification for peripheral vascular disease and diabetic foot in diabetes based on machine learning models and explainable algorithms

基于机器学习模型和可解释算法的糖尿病周围血管疾病和糖尿病足预测模型及风险因素识别

阅读:2

Abstract

Diabetic peripheral vascular disease (DPVD) and diabetic foot (DF) are major complications that lead to disability in diabetic patients, severely impaired their quality of life. Firstly, this study gathered cross-sectional data from 1240 patients with type 2 diabetes and its complications in the the department of vascular surgery and endocrinology of the second affiliated hospital of zhejiang university school of medicine. In the pre-processing part, samples with serious data loss are eliminated, and the data are processed by methods such as MICEforest. After that, random forest (RF), support vector machine (SVM), backpropagation neural network (BPNN), extreme gradient boosting (XGBoost), and SHapley Additive exPlanation (SHAP) were employed to rank the importance of the 27 indicators. The entropy weight method was then applied to comprehensively assign weights to all indexes. Finally, the genetic neural network algorithm (GA-BPNN) was introduced to construct a prediction model for diabetes complications. In addition, the SHAP algorithm was applied to obtain the weight and importance ranking of each risk factor in the prediction model. This study identified the top 17 key indicators through a comprehensive weighting approach. Among the 5 classification models evaluated, the GA-BPNN algorithm exhibited the best performance in both diabetes and DPVD (G1), DPVD and DF (G2), achieving the area under the receiver operating characteristic curve (AUC) values of 0.79 and 0.89, accuracy rates of 0.78 and 0.80, and F1-scores of 0.77 and 0.83, respectively. Furthermore, hypothesis testing results indicate that indicators such as fibrinogen and c-reactive protein show statistically significant differences between groups. SHAP feature importance analysis also highlights the significant influence of these features in identifying diabetic complications. GA-BPNN can be employed as a prediction model for DPVD and DF. In feature selection, the comprehensive weighting method and SHAP analysis identified key features. In summary, this study constructed a comprehensive prediction model based on machine learning and interpretable algorithms, integrating diabetes-specific indicators, traditional cardiovascular risk factors, coagulation function, inflammatory markers, and cardiac structural parameters. It can effectively identify high-risk patients for diabetic complications, uncover potential features, and thereby assist in subsequent efforts to reduce the incidence of these complications.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。