Identifying subjective life expectancy risk factors in physically active and inactive middle-aged and older adults using machine learning models

利用机器学习模型识别身体活跃和不活跃的中老年人的主观预期寿命风险因素

阅读:1

Abstract

BACKGROUND: Physical activity is a key focus in the field of public health, and subjective life expectancy is closely associated with individuals' physical and psychological well-being. This study aimed to identify the risk factors for subjective life expectancy among middle-aged and older adults with active and inactive physical activity levels, and to provide an evidence base for developing differentiated health intervention strategies. METHODS: Based on data from the China Health and Retirement Longitudinal Study (CHARLS) 2018 survey, a total of 10,945 participants were included. Five machine learning models, including Random Forest (RF), Logistic Regression (LR), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), were separately constructed for the active and inactive groups. To reduce bias caused by class imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was applied to generate synthetic samples for the minority class. The dataset was split into a training set (70%) and a testing set (30%), and ten-fold cross-validation combined with grid search was employed to optimize hyperparameters, ensuring both robustness and generalizability of the models. Model performance was evaluated using the Area Under the Receiver Operating Characteristic Curve (AUC), accuracy, sensitivity, specificity, and F1-score. RESULTS: The active group (4,707 men and 4,885 women) had a mean age of 59.76 years, while the inactive group (662 men and 691 women) had a mean age of 63.00 years. The Support Vector Machine (SVM) model achieved the best performance in the inactive group (AUC: 0.797; accuracy: 0.722; sensitivity: 0.747), whereas the Light Gradient Boosting Machine (LightGBM) model achieved the best performance in the active group (AUC: 0.775; accuracy: 0.745; specificity: 0.814). Feature importance analysis indicated that "age" was the most important variable in the Support Vector Machine (SVM) model, while "perceived health" was the most important variable in the Light Gradient Boosting Machine (LightGBM) model. CONCLUSION: Machine learning methods can effectively identify key risk factors influencing subjective life expectancy among middle-aged and older adults, and provide valuable guidance for targeted health management strategies tailored to populations with different levels of physical activity.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。