Abstract
BACKGROUND: Obesity significantly increases the risk of depression, yet interpretable depression risk prediction tools specifically targeting obese individuals remain very limited. This study aims to develop and validate a depression risk prediction model for individuals with obesity using data from the United States National Health and Nutrition Examination Survey (NHANES). METHODS: A total of 6,271 individuals with obesity from the 2005–2020 NHANES cycles were included in this study. Feature selection was conducted using the least absolute shrinkage and selection operator regression and multivariable logistic regression to identify robust predictors. To address class imbalance, the Synthetic Minority Oversampling Technique (SMOTE) was applied to the training dataset. Nine machine-learning(ML) algorithms were developed and compared. Model interpretability was evaluated using SHapley Additive exPlanations (SHAP), and an interactive online calculator was developed based on the best-performing model to support practical clinical risk estimation. RESULTS: Among the nine models, the Stacking-ensemble model demonstrated the highest performance, achieving an AUC of 0.82, along with strong balanced accuracy, F1 score, and Matthews correlation coefficient. SHAP analysis revealed that sleep disturbance, poverty-income ratio (PIR), and gender were the most influential predictors of depression risk. CONCLUSION: ML models can accurately predict depression risk in individuals with obesity. The Stacking-ensemble model showed the highest predictive performance, and an associated online calculator provides clinicians with a practical tool to rapidly estimate individual risk, supporting informed clinical decision-making. CLINICAL TRIAL NUMBER: Not applicable. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-026-03359-7.