Abstract
Accurate assessment of sleep architecture is critical for diagnosing and managing sleep disorders, which significantly impact global health and well-being. While polysomnography (PSG) remains the clinical gold standard, its inherent intrusiveness, high cost, and logistical complexity limit its utility for routine or home-based monitoring. Recent advances highlight that subtle variations in respiratory dynamics, such as respiratory rate and cycle regularity, exhibit meaningful correlations with distinct sleep stages and could serve as valuable non-invasive biomarkers. In this work, we propose a framework for estimating sleep stage distribution-specifically Wake, Light (N1+N2), Deep (N3), and REM-based on respiratory audio captured over a single sleep episode. The framework comprises three principal components: (1) a segmentation module that identifies distinct respiratory cycles in respiratory sounds using a fine-tuned Transformer-based architecture; (2) a feature extraction module that derives a suite of statistical, spectral, and distributional descriptors from these segmented respiratory patterns; and (3) stage-specific regression models that predict the proportion of time spent in each sleep stage. Experiments on the public PSG-Audio dataset (287 subjects; mean 5.3 h per subject), using subject-wise cross-validation, demonstrate the efficacy of the proposed approach. The segmentation model achieved lower RMSE and MAE in predicting respiratory rate and cycle duration, outperforming classical signal-processing baselines. For sleep stage proportion prediction, the proposed method yielded favorable RMSE and MAE across all stages, with the TabPFN model consistently delivering the best results. By quantifying interpretable respiratory features and intentionally avoiding black-box end-to-end modeling, our system may support transparent, contact-free sleep monitoring using passive audio.