Abstract
PURPOSE: The demands of nursing work include high pressure, high intensity, and low substitutability. Coupled with the complex work environment, including patient-nurse relationships, colleague interactions, shift work, and night shifts, these factors present significant sleep challenges for nurses. A risk prediction model for sleep disturbances in nurses was developed using various machine learning (ML) algorithms to enhance the accuracy of identifying associated predictive factors. METHODS: From November 2019 to February 2020, the research team distributed self-assessment questionnaires to 900 nurses in two tertiary general hospitals in China, collecting a total of 728 valid responses. These questionnaires covered baseline characteristics, work-related information, and data on physical and mental health. By integrating multiple ML models, a predictive model for sleep disturbances among nurses was developed using an important subset of features selected through recursive feature elimination. The performance of various ML models was evaluated on the test set and external validation set using metrics such as the area under the curve (AUC), accuracy, sensitivity, specificity, and positive predictive value. Model performance and effectiveness at different thresholds were assessed using calibration curves and decision curve analysis (DCA). Additionally, the optimal model was selected, and the SHapley Additive exPlanations (SHAP) method was employed for visual interpretation of model features and individual predictions. RESULTS: The prevalence of sleep disturbances in Chinese nurses was 41.6%. The multi-layer perceptron (MLP) emerged as the best-performing model, achieving the highest AUC of 0.814 on the test set, slightly ahead of logistic regression (0.808) and XGBoost (0.805). It also demonstrated a competitive accuracy of 0.725. In the external validation set, MLP’ s AUC was 0.798, with an accuracy of 0.728. MLP exhibited balanced performance across various metrics, including sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1-score. Patient/family incivility, workplace incivility, overcommitment, optimism, hope, ERR, physical fatigue, and years of work experience were identified as significant predictors of sleep disturbances among nurses. CONCLUSION: Based on baseline data, psychological characteristics, and work environment information, we developed ML models to predict sleep disturbances in nurses. Using the SHAP method, we successfully identified key predictive factors. This study highlights the association between work environment/psychological factors and sleep disturbances, suggesting that these factors should be considered in future intervention studies aimed at improving nurses’ sleep quality. CINICAL TRIAL NUMBER: Not applicable. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12912-025-03803-5.