Abstract
OBJECTIVE: This study aims to develop, validate, and visualize a novel machine learning (ML)-based predictive model for depression risk in patients with sleep disorders. METHODS: Using data from the NHANES (2005–2020), 11 machine learning models were constructed, including Least Absolute Shrinkage and Selection Operator (LASSO), Ridge Regression (Ridge), Elastic Net (ENet), Light Gradient Boosting Machine(LightGBM), Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), XGBoost, Logistic Regression (LR), K-Nearest Neighbors (KNN), and Multilayer Perceptron (MLP). Model performance was evaluated using multiple metrics. Decision Curve Analysis (DCA) and calibration curves were used to assess the clinical applicability of the models. SHAP values were applied for model interpretation, and an online web calculator was developed for further model visualization. RESULTS: Among the 11 machine learning models, the LightGBM model demonstrated the best performance with an AUC of 0.73. Calibration curves for both the training and test sets confirmed the model’s good calibration. The SHAP summary plot showed that the top three important features in the model were age, poverty-income ratio (PIR), and marital status. The model was integrated into an interactive web application that allows clinicians to predict depression risk based on 10 key clinical variables. CONCLUSION: This study successfully developed a predictive model for depression risk in patients with sleep disorders, demonstrating strong discriminatory ability and good clinical applicability. The online application provides clinicians with a user-friendly tool to assess depression risk and guide targeted prevention and intervention strategies. CLINICAL TRIAL NUMBER: Not applicable. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12888-025-07730-2.