Application of generalized linear mixed effects random forest for identifying risk factors of prediabetes in Tehran Lipid and Glucose Study

应用广义线性混合效应随机森林识别德黑兰脂质和葡萄糖研究中糖尿病前期风险因素

阅读:1

Abstract

Prediabetes is a major risk factor for the development of diabetes, defined by blood glucose levels that are elevated but not yet high enough to meet the diagnostic criteria for Diabetes Mellitus. This condition is often clinically "silent" yet it can already lead to negative effects on various organ systems and frequently indicates the impending onset of type 2 diabetes mellitus. This study aimed to compare a traditional statistical model, the Generalized Linear Mixed Model (GLMM), with two tree-based machine learning models, Random Forest (RF) and Generalized Mixed-Effects Random Forest (GMERF), for predicting prediabetes and identifying key risk indicators in longitudinal data. The study sample included 5361 individuals aged over 20 years, focusing on 32 different variables. The target variable was the presence of prediabetes in a longitudinal setting. We applied three models: RF, which is tree-based but does not account for repeated measurements; GLMM, which handles random effects but assumes linear relationships; and GMERF, a hybrid model that incorporates both random effects and the nonlinearity of decision trees. Model performance was evaluated using standard predictive metrics. Among the three models, GMERF achieved the highest predictive performance. The area under the ROC curve was 0.63 for RF, 0.70 for GLMM, and 0.74 for GMERF. In the GMERF model, the top five predictive variables were Waist-to-Hip Ratio (WHR), age, waist circumference, triglyceride level, and Waist-to-Height Ratio (WHtR). WHR was ranked as the most important feature in both the GMERF and RF models. All of these variables, except WHtR, were also found to be significant in the GLMM model. In longitudinal data, there is an inherent dependence between observations collected over time. By incorporating these considerations, models that account for this data structure are better equipped to handle the complexities of longitudinal data, leading to more reliable and accurate predictions.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。