Machine Learning-Based Prediction Model Construction for Type 2 Diabetes Mellitus: A Comparison of Algorithms and Multilevel Risk Factor Analysis

基于机器学习的2型糖尿病预测模型构建:算法与多层次风险因素分析的比较

阅读:1

Abstract

BACKGROUND: Against the backdrop of the global high incidence of Type 2 diabetes mellitus (T2DM), existing prediction models are largely confined to single-dimensional risk factors, suffering from a core limitation of lacking multilevel integrated analysis. Given the severe impact of T2DM on individual health and healthcare systems, the construction of a comprehensive and accurate prediction model is of great significance. OBJECTIVE: This study is aimed at constructing a T2DM prediction model, identifying multilevel risk factors, and enabling early screening, so as to help clinicians identify high-risk individuals and provide targets for public health interventions. METHODS: Data from the National Health and Nutrition Examination Survey (NHANES) 2021-2023 were used, including 6337 participants aged 18 years and older. Missing values were handled using Monte Carlo multiple imputation, collinearity was reduced via principal component analysis (PCA), and feature selection was performed using random forest (RF) and recursive feature elimination (RFE). The adaptive synthetic sampling (ADASYN) method was applied to address class imbalance. The performance of seven machine learning models, including decision tree, random forest, extreme gradient boosting (XGBoost), and adaptive boosting (AdaBoost), was compared. RESULTS: The AdaBoost model exhibited the optimal performance, with an area under the curve (AUC) of 0.85 (95% confidence interval: 0.85-0.86), an accuracy of 0.71 (95% confidence interval: 0.70-0.72), and an F1 score of 0.71; its performance was further improved after parameter optimization. A total of 24 key risk factors were identified, including 19 at the individual trait level, 3 at the individual behavior level, and 2 related to working and living conditions. CONCLUSIONS: Machine learning models integrating multidimensional risk factors based on the health ecology framework can more accurately predict T2DM risk, providing a scientific basis for multilevel interventions. The innovation of this study lies in the first integration of the health ecology model with machine learning technology to systematically identify cross-level risk factors. Compared with traditional models, it is more comprehensive, breaks through the limitations of previous studies, and provides a new and effective tool for the precise prevention of T2DM and public health interventions.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。