Abstract
Background: Sleep staging is vital for diagnosing sleep disorders, but the clinical gold standard, polysomnography, is too intrusive for routine home monitoring. While photoplethysmography (PPG) offers a wearable alternative, achieving high diagnostic accuracy remains challenging due to signal noise and individual variability. Methods: We developed DCA-Sleep, a deep learning framework using a Dual Cross-Attention (DCA) mechanism to capture long-range temporal dependencies from raw single-channel PPG. To overcome data scarcity, a cross-modality transfer learning strategy was implemented, pre-training the model on six electrocardiogram (ECG) datasets before extensive validation on a combined cohort of 9738 subjects across nine public datasets (including MESA and CFS). Results: DCA-Sleep demonstrated superior robustness, achieving an average F1-score of 0.731 and a Cohen's Kappa of 0.652 on the MESA dataset, significantly outperforming state-of-the-art baselines. The model showed high sensitivity in detecting Wake and Deep Sleep stages, which are critical for clinical assessment. Conclusions: This study provides a large-scale validation of a PPG-based staging tool, confirming its reliability as a non-invasive, scalable solution for long-term sleep monitoring and clinical screening.