Predicting 3-year depressive symptoms among middle-aged and older adults in rural China using random forest: insights from the China health and retirement longitudinal study

利用随机森林预测中国农村中老年人群3年抑郁症状:来自中国健康与退休纵向研究的启示

阅读:1

Abstract

BACKGROUND: Under China’s dual economic structure of urban and rural areas, rural regions face issues such as low socioeconomic status, inadequate healthcare resources, and neglect of mental health, leading to a higher prevalence of depression among middle-aged and older adults (above 45 years) in this area. METHODS: This prospective cohort study used data from 6,183 rural Chinese middle-aged and older adults in the China Health and Retirement Longitudinal Study (CHARLS, 2018–2020). A random forest model was developed to predict 3-year incidents of depressive symptoms. Independent risk factors were identified via chi-square tests followed by binary logistic regression (Odds Ratios [ORs] and 95% Confidence Intervals [CIs] reported for significant variables, p < 0.05). The model’s performance and clinical utility were assessed using standard metrics and Decision Curve Analysis (DCA). SHapley Additive exPlanations (SHAP) values determine the individual feature impact on predictions. A subgroup analysis also compared depression-related characteristics in middle-aged (45–59 years) versus older adults (≥ 60 years) with incident depressive symptoms. RESULTS: Over a 3-year follow-up, 1,629 (26.35%) participants developed incident depressive symptoms. A Random Forest model, optimized using Recursive Feature Elimination (RF-RFE), which selected 28 key predictors from an initial 33. After threshold adjustment (optimal threshold = 0.43) to maximize the F1-score, the model achieved an accuracy of 0.736, precision of 0.499, recall of 0.607, F1-score of 0.548, and an AUC of 0.776 (95% CI: 0.763–0.788). The mean Brier score was 0.163 ± 0.006. DCA confirmed its clinical utility. Key protective factors identified via logistic regression included being male, higher education, and internet access. Conversely, increased age, poor self-rated health, lower life satisfaction, and functional limitations were significant risk factors for incident depressive symptoms. CONCLUSION: The random forest model demonstrates moderate predictive ability to estimate the risk of depressive symptoms in individuals aged 45 and above in rural China over the next 3 years. It offers a potentially valuable screening tool for rural regions with low mental health awareness and high depression prevalence, enabling more targeted interventions and prevention strategies. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40359-025-03513-2.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。