Abstract
OBJECTIVE: To develop a predictive model for evaluating depression among middle-aged and elderly individuals in China. METHODS: Participants aged ≥ 45 from the 2020 China Health and Retirement Survey (CHARLS) cross-sectional study were enrolled. Depressive mood was defined as a score of 10 or higher on the CESD-10 scale, which has a maximum score of 30. A predictive model was developed using five selected machine learning algorithms. The model was trained and validated on the 2020 database cohort and externally validated through a questionnaire survey of middle-aged and elderly individuals in Shaanxi Province, China, following the same criteria. SHapley Additive Interpretation (SHAP) was employed to assess the importance of predictive factors. RESULTS: The stacked ensemble model demonstrated an AUC of 0.8021 in the test set of the training cohort for predicting depressive symptoms; the corresponding AUC in the external validation cohort was 0.7448, outperforming all base models. CONCLUSION: The stacked ensemble approach serves as an effective tool for identifying depression in a large population of middle-aged and elderly individuals in China. For depression prediction, factors such as life satisfaction, self-reported health, pain, sleep duration, and cognitive function are identified as highly significant predictive factors.