Development and external validation of an interpretable machine learning-based model for obesity risk prediction in 2-18-year-old children and adolescents in Beijing and Tangshan

在北京和唐山地区,开发并外部验证了一种基于机器学习的、可解释的肥胖风险预测模型,用于预测2-18岁儿童和青少年的肥胖风险。

阅读:5

Abstract

BACKGROUND: The multifactorial mechanisms driving childhood obesity, a global public health challenge, are yet to be fully elucidated. We aimed to develop and externally validate three widely applied machine learning models alongside logistic regression in 2-18-year-old children and adolescents in Beijing and Tangshan to predict obesity risk. As a further step, we wanted to interpret the optimised model and translate it into a web-based tool to inform clinical decision-making. METHODS: We analysed data of 19 024 (training/testing) and 2410 (external validation) children and adolescents from Beijing and Tangshan, respectively. Using a set of factors including demographic, familial, socioeconomic, lifestyle, and perinatal variables, we developed four models (light gradient boosting machine, random forest, eXtreme gradient boosting (XGBoost), and logistic regression) and compared their predictive performance. After validation, we selected an optimised model and interpreted it using SHapley Additive exPlanations (SHAP) analysis. Then, we developed an online calculator with interpretable visualisations to enable real-time risk assessment. RESULTS: The XGBoost model exhibited superior performance, with an area under the receiver operating characteristic curve (AUROC) of 0.875 on the external validation set, significantly outperforming the logistic regression model (AUROC = 0.718). To identify the minimal feature subset that maintained model efficacy, we incrementally incorporated predictors in the descending order of SHAP importance values while assessing key performance metrics (accuracy, AUROC, and F-beta score). This SHAP-based analysis identified nine key predictors of childhood obesity: birth length, paternal body mass index (BMI), maternal BMI, sleep duration, physical activity, birth weight, maternal age at delivery, delivery mode, and gestational age. The deployed online tool provides individualised risk probabilities and SHAP-derived explanations. CONCLUSIONS: The XGBoost model in our study was the superior ensemble learning method for predicting childhood obesity. The digital tool integrates this model and can help clinical practitioners determine individuals' risk of childhood obesity.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。