A machine learning approach for type 2 diabetes diagnosis and prognosis using tailored heterogeneous feature subsets

一种利用定制异构特征子集进行2型糖尿病诊断和预后的机器学习方法

阅读:1

Abstract

Type 2 diabetes (T2D) is becoming one of the leading health problems in Western societies, diminishing quality of life and consuming a significant share of healthcare resources. This study presents machine learning models for T2D diagnosis and prognosis, developed using heterogeneous data from a Spanish population dataset (Di@bet.es study). The models were trained exclusively on individuals classified as controls and undiagnosed diabetics, ensuring that the results are not influenced by treatment effects or behavioral changes due to disease awareness. Two data domains are considered: environmental (patient lifestyle questionnaires and measurements) and clinical (biochemical and anthropometric measurements). The preprocessing pipeline consists of four key steps: geospatial data extraction, feature engineering, missing data imputation, and quasi-constancy filtering. Two working scenarios (Environmental and Healthcare) are defined based on the features used, and applied to two targets (diagnosis and prognosis), resulting in four distinct models. The feature subsets that best predict the target have been identified based on permutation importance and sequential backward selection, reducing the number of features and, consequently, the cost of predictions. In the Environmental scenario, models achieved an AUROC of 0.86 for diagnosis and 0.82 for prognosis. The Healthcare scenario performed better, with an AUROC of 0.96 for diagnosis and 0.88 for prognosis. A partial dependence analysis of the most relevant features is also presented. An online demo page showcasing the Environmental and Healthcare T2D prognosis models is available upon request.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。