Abstract
BACKGROUND: The prevalence of depression symptoms, the third most disabling disease worldwide, is as high as 11.5%-21.1% in China's middle-aged and elderly population and increases significantly with age. It is crucial to identify high-risk groups efficiently and implement appropriate early interventions to improve the performance of depression risk prediction models. METHODS: We used data from the China Health and Retirement Longitudinal Study (CHARLS, 2011-2020) to track depression the onset characteristics of depression symptoms in adults aged over 45 without depressive symptoms at baseline. This tracking was conducted over 9 years, involving four follow-ups. Eight machine-learning models, with pre-sampling and three types of resampled data, were employed. Their hyperparameters were optimized through a grid search strategy and tenfold cross-validation. Model performance was evaluated, including the area under the ROC curve (AUC), precision, recall, and F1 score. Additionally, Shapley Additive Properties (SHAP) plots for interpretability. RESULTS: The cumulative incidence of depression symptoms at different follow-up time points was 19.043%, 22.554%, 27.416%, and 29.416%, respectively, with higher incidence rates in females, rural areas, those with low education, and the western regions. The RandomUnder-Sampler-extreme gradient boosting(XGB) model performed optimally in predicting the 9-year risk of depression symptoms (recall = 70.36%, F1 = 0.5605, AUC = 0.750). SHAP analysis showed that education level, cognitive ability, and satisfaction with life were the core factors affecting the prediction of depression symptoms. CONCLUSIONS: The prevalence of depressive symptoms in China's middle-aged and elderly population is high, and the influencing factors are complex. When predicting depressive symptoms, the model should be selected based on the prediction needs, and random undersampling with XGB is suitable for long-term risk prediction in large-scale populations. For high-risk groups, accurate prediction strategies can be used to reduce the risk of depressive symptoms.