Abstract
Wearable EEG sleep monitoring devices (wEEGs) are increasingly popular in both clinical and consumer applications. However, their performance compared to polysomnography (PSG), the gold standard, remains under study. This meta-analysis of 43 validation studies assessed wEEGs against PSG, analyzing the influence of study design and device characteristics. The results revealed moderate to substantial agreement between wEEGs and PSG, with performance varying across sleep stages. The N1 stage posed significant classification challenges, while N3 (Deep Sleep) was most reliably detected. Manually scored wEEG data outperformed automatic scoring for N1 detection, and a higher electrode count was associated with improved N3 classification. This study proposes a standardized framework with balanced metrics like MCC and κ to address stage-specific performance variabilities, enhancing device comparability. The findings highlight the strengths and weaknesses of wEEGs and guide future research to refine automatic staging, contributing to their optimization for clinical and consumer applications.