Abstract
BACKGROUND: Substantial metabolic heterogeneity exists prior to the development of diabetes, creating opportunities for earlier and more precise intervention. This study aimed to analyze common clinical indicators in a diabetes-free population using innovative clustering methods to identify characteristic subgroups and evaluate their utility in stratified prediction of diabetes risk and related complications. METHODS: This analysis included 13,829 adults without diabetes from the Kunshan Aging Research with E-health (KARE) cohort, a population-based longitudinal cohort of 51,400 community-dwelling residents from both urban and rural areas of Kunshan City, China, who have received annual health examinations between January 2014 and December 2023. A novel subtype classification method based on complication clustering and weighted naive Bayes classification was applied to select the most informative variables and categorize individuals into distinct diabetes subtypes. We then assessed 3-year risks of diabetes and complications, including cardiovascular disease (CVD), fatty liver disease (FLD), and stroke. To evaluate the influence of genetic factors, polygenic risk scores (PRS) were compared across all participants. External validation was performed using data from 6209 diabetes-free individuals in a cohort of 22,630 people who have been followed since 2014 at Beijing Jiuhua Hospital. RESULTS: Thirteen clinically relevant variables were identified: sex, age, body mass index (BMI), waist circumference, triglycerides (TG), total cholesterol (TC), high-density lipoprotein cholesterol (HDL-C), alanine aminotransferase (ALT), uric acid (UA), blood urea nitrogen (BUN), fasting blood glucose (FBG), systolic blood pressure (SBP), and heart rate. Three clusters were identified in the Kunshan cohort. Cluster 1 (n = 6751) had favorable indicators and the lowest risks of diabetes (2.04%, 138/6751) and complications, including CVD (4.52%, 305/6751), FLD (15.30%, 1033/6751), and stroke (9.07%, 612/6751). Cluster 2 (n = 4622) had the poorest glucose and lipid control, with the highest 3-year cumulative incidence of diabetes (9.95%, 460/4622) and FLD (52.14%, 2410/4622). Cluster 3 (n = 2456) was characterized by the oldest age, highest SBP, BMI, and waist circumference, with intermediate diabetes risk (3.05%, 75/2456) and the highest risks of CVD (8.47%, 208/2456) and stroke (14.13%, 347/2456). In Cox survival analysis for FLD (adjusted), using Cluster 1 as reference, the hazard ratio (HR) was 2.357 (95% confidence interval [CI]: 2.161-2.571, P <0.001) for Cluster 2, and was 1.903 (95% CI: 1.718-2.108, P <0.001) for Cluster 3. In Cox survival analysis for CVD, HRs were 1.193 (95% CI: 0.975-1.459, P = 0.087) for Cluster 2 and 1.295 (95% CI: 1.041-1.611, P = 0.02) for Cluster 3. In stroke analysis, HRs were 1.058 (95% CI: 0.911-1.23, P = 0.46) for Cluster 2 and 1.212 (95% CI: 1.029-1.428, P = 0.021) for Cluster 3. The risks of diabetes and CVD predicted by PRS were consistent with those identified by clinical clustering. The findings were independently confirmed in the Beijing Jiuhua Hospital cohort. CONCLUSIONS: Phenotypes derived from clinical characteristic analysis using the new clustering method effectively identify and stratify the risk of diabetes and related complications, such as CVD, FLD, and stroke, in two large cohorts of diabetes-free Chinese adults, supporting the development of more precise and individualized prevention strategies.