Abstract
Identifying new subgroups among children and adolescents with obesity and metabolic syndrome requires advanced clustering techniques capable of analyzing complex multidimensional data. This study aimed to employ machine learning methods to enhance the classification of obesity and metabolic syndrome subgroups in youth, facilitating early detection and targeted intervention strategies. Data were derived from three nationwide, multicenter, school-based CASPIAN studies conducted in Iran during 2003-2004, 2009-2010, and 2015. After excluding metabolically healthy non-obese participants, the final sample included 382, 787, and 594 individuals aged 7-10, 11-14, and 15-18 years, respectively. Metabolic syndrome (MetS) status was defined according to Adult Treatment Panel III criteria. Unsupervised machine learning, specifically Gaussian Mixture Models (GMM), was applied to the top five principal components in each age group. The Davies-Bouldin index determined the optimal number of clusters. Clinical features associated with metabolism and obesity were analyzed within each cluster. In the 7-10 years group, six distinct clusters were identified based on key metabolic and anthropometric variables. The 11-14 years group yielded seven clusters, each with unique metabolic and anthropometric characteristics. For adolescents aged 15-18, six clusters reflected a more pronounced interaction between anthropometric measures and metabolic risk factors, consistent with physiological maturation. Stability tests showed mean clustering accuracies of 76.3%, 65.5%, and 52% for the three age groups, respectively. Predictability tests demonstrated an average accuracy exceeding 87% across all groups, indicating the robustness and reliability of the clustering approach. This study demonstrated that machine learning can uncover hidden metabolic and anthropometric heterogeneity in pediatric obesity, providing a methodological framework for identifying meaningful subgroups for targeted interventions.