Abstract
BACKGROUND: Predicting depression risk in adults is critical for timely interventions to improve quality of life. To develop a scientific basis for depression prevention, machine learning models based on longitudinal data that can assess depression risk are necessary. METHODS: Data from 2,331 healthy older adults who participated in the China Health and Retirement Longitudinal Study (CHARLS) from 2018 to 2020 were used to develop and validate the predictive model. Depression was assessed using the 10-item Center for Epidemiologic Studies Depression Scale (CES-D-10), with a score of ≥10 indicating depressive symptoms. Several machine learning algorithms, including logistic regression, k-nearest neighbor, support vector machine, multilayer perceptron, decision tree, and XGBoost, were employed to predict the 2-year depression risk. The dataset was randomly split into a training set (70%) and a testing set (30%), and hyperparameters were optimized in the training phase. The models' performance was evaluated in the testing set using accuracy, sensitivity, specificity, area under the receiver operator characteristic (ROC) curve, and F1 score. Model interpretability was enhanced using SHapley Additive exPlanations (SHAP). RESULTS: A total of 563 (24.15%) participants developed depression during the 2-year follow-up period. LASSO regression identified 12 key predictive features from an initial set of 26. Among the six models tested, XGBoost exhibited the best predictive performance, achieving the highest area under the ROC curve (0.774), accuracy (0.722), sensitivity (0.757), and F1 score (0.720), with a specificity of 0.687. Decision curve analysis (DCA) confirmed the net clinical benefit of the XGBoost model across most threshold ranges. SHAP interpretation revealed that cognitive ability, total income, life satisfaction, sleep quality, and pain were the top five most influential factors in predicting depression risk. CONCLUSION: Our findings support the feasibility of using machine learning-based models to predict depression risk in healthy older adults over a 2-year period. The integration of XGBoost and SHAP enhances model interpretability, offering valuable insights into individual risk factors. This approach enables personalized risk assessment, which may help develop targeted interventions for depression prevention in aging populations.