Abstract
BACKGROUND: The concurrent rise of childhood obesity and hyperuricemia presents a serious public health concern. These conditions interact through complex metabolic mechanisms and significantly increase long-term risks of cardiometabolic diseases. Machine learning (ML) offers an effective framework for constructing efficient risk prediction models in pediatric populations. OBJECTIVE: This study aimed to develop and evaluate two ML models-Random Forest (RF) and Support Vector Classification (SVC)-to predict the risk of childhood obesity and hyperuricemia by integrating clinical and biochemical variables. METHODS: A total of 101 children were enrolled, including 60 with obesity and 41 with obesity plus hyperuricemia. Data preprocessing involved recursive feature elimination (RFE), ROSE-based oversampling, and feature standardization. Both RF and SVC models were trained and evaluated using area under the ROC curve (AUC), precision-recall curves, and calibration curves. SHAP (Shapley Additive Explanations) analysis was conducted to interpret feature contributions. RESULTS: Both models demonstrated strong predictive performance, with AUCs reaching 0.96. The SVC model achieved slightly higher average precision and recall, making it more suitable for community- or school-based screening of high-risk children. In contrast, the RF model exhibited superior calibration, suggesting its greater utility in clinical decision-making where probabilistic risk estimation guides personalized follow-up or intervention planning. SHAP analysis identified glomerular filtration rate (GFR), high-density lipoprotein cholesterol (HDL-C), and apolipoprotein B (ApoB) as key predictors, some exhibiting nonlinear associations with disease risk. CONCLUSION: RF and SVC models offer reliable tools for early risk prediction of obesity and hyperuricemia in children, each tailored to distinct clinical scenarios. These findings support early identification and targeted intervention. Future studies will explore the integration of metabolomic data and ensemble approaches to further enhance model performance and clinical applicability.