Development and validation of a machine learning model for cardiovascular disease risk prediction in type 2 diabetes patients

开发和验证用于预测2型糖尿病患者心血管疾病风险的机器学习模型

阅读:1

Abstract

Patients with type 2 diabetes mellitus (T2DM) have a significantly higher risk of cardiovascular disease (CVD) compared to the general population. Accurately predicting this risk is crucial for developing personalized treatment plans and public health interventions. This study aims to develop and validate a model for predicting CVD risk in T2DM patients using the Boruta feature selection algorithm and machine learning methods. We analyzed data from the National Health and Nutrition Examination Survey (NHANES) from 1999 to 2018. Six machine learning (ML) models, including Multilayer Perceptron (MLP), Light Gradient Boosting Machine (LightGBM), Decision Tree (DT), Extreme Gradient Boosting (XGBoost), Logistic Regression (LR), and k-Nearest Neighbors (KNN), were employed for model development and validation. Boruta was used for optimal feature selection. The performance of the machine learning models was comprehensively evaluated using ROC curves, accuracy, and other related metrics. Shapley Additive Explanation (SHAP) analysis was conducted for visual interpretation, and the Shinyapps.io platform was utilized to deploy the best-performing models as web-based applications. A total of 4,015 T2DM patients were included, among which 999 (24.9%) had CVD. Model evaluation revealed significant overfitting with the KNN algorithm, which showed perfect discrimination in the training set but performed poorly in the test set (AUC = 0.64). In contrast, XGBoost demonstrated more consistent performance between training and testing datasets (AUC = 0.75 and 0.72, respectively), indicating better generalization ability and making it more suitable for clinical application. Using SHAP analysis, the top 10 important influencing factors identified by the XGBoost model were utilized to construct a CVD risk prediction platform for T2DM patients. The prediction model based on Boruta feature selection and machine learning shows promising results in assessing the CVD risk among T2DM patients. This study provides a viable tool for clinical use, facilitating early intervention and precision treatment.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。