Prediction of Insulin Resistance in Nondiabetic Population Using LightGBM and Cohort Validation of Its Clinical Value: Cross-Sectional and Retrospective Cohort Study

利用LightGBM预测非糖尿病人群胰岛素抵抗及其临床价值的队列验证:横断面和回顾性队列研究

阅读:2

Abstract

BACKGROUND: Insulin resistance (IR), a precursor to type 2 diabetes and a major risk factor for various chronic diseases, is becoming increasingly prevalent in China due to population aging and unhealthy lifestyles. Current methods like the gold-standard hyperinsulinemic-euglycemic clamp has limitations in practical application. The development of more convenient and efficient methods to predict and manage IR in nondiabetic populations will have prevention and control value. OBJECTIVE: This study aimed to develop and validate a machine learning prediction model for IR in a nondiabetic population, using low-cost diagnostic indicators and questionnaire surveys. METHODS: A cross-sectional study was conducted for model development, and a retrospective cohort study was used for validation. Data from 17,287 adults with normal fasting blood glucose who underwent physical exams and completed surveys at the Health Management Center of Xiangya Third Hospital, Central South University, from January 2018 to August 2022, were analyzed. IR was assessed using the Homeostasis Model Assessment (HOMA-IR) method. The dataset was split into 80% (13,128/16,411) training and 20% (32,83/16,411) testing. A total of 5 machine learning algorithms, namely random forest, Light Gradient Boosting Machine (LightGBM), Extreme Gradient Boosting, Gradient Boosting Machine, and CatBoost were used. Model optimization included resampling, feature selection, and hyperparameter tuning. Performance was evaluated using F1-score, accuracy, sensitivity, specificity, area under the curve (AUC), and Kappa value. Shapley Additive Explanations analysis was used to assess feature importance. For clinical implication investigation, a different retrospective cohort of 20,369 nondiabetic participants (from the Xiangya Third Hospital database between January 2017 and January 2019) was used for time-to-event analysis with Kaplan-Meier survival curves. RESULTS: Data from 16,411 nondiabetic individuals were analyzed. We randomly selected 13,128 participants for the training group, and 3283 participants for the validation group. The final model included 34 lifestyle-related questionnaire features and 17 biochemical markers. In the validation group, their AUC were all greater than 0.90. In the test group, all AUC were also greater than 0.80. The LightGBM model showed the best IR prediction performance with an accuracy of 0.7542, sensitivity of 0.6639, specificity of 0.7642, F1-score of 0.6748, Kappa value of 0.3741, and AUC of 0.8456. Top 10 features included BMI, fasting blood glucose, high-density lipoprotein cholesterol, triglycerides, creatinine, alanine aminotransferase, sex, total bilirubin, age, and albumin/globulin ratio. In the validation queue, all participants were separated into the high-risk IR group and the low-risk IR group according to the LightGBM algorithm. Out of 5101 high-risk IR participants, 235 (4.6%) developed diabetes, while 137 (0.9%) of 15,268 low-risk IR participants did. This resulted in a hazard ratio of 5.1, indicating a significantly higher risk for the high-risk IR group. CONCLUSIONS: By leveraging low-cost laboratory indicators and questionnaire data, the LightGBM model effectively predicts IR status in nondiabetic individuals, aiding in large-scale IR screening and diabetes prevention, and it may potentially become an efficient and practical tool for insulin sensitivity assessment in these settings.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。