Abstract
Emotion recognition based on EEG signals remains a challenging task due to the complex spatiotemporal properties of brain activity and substantial intersubject variability. To address these challenges, we propose the EED-CL framework, which integrates an extended EEG-Deformer (EED) with contrastive learning (CL). The proposed model incorporates a depthwise separable convolution encoder for efficient extraction of spatiotemporal EEG features, a hierarchical coarse-to-medium-to-fine (HCMFT) transformer to capture multiscale temporal patterns, and an attentive dense information purification (ADIP) module to suppress noise and refine essential latent representations. In addition, CL-based pretraining facilitates robust feature learning even in settings with limited labeled data. The extracted multiscale features are integrated and classified through a Transformer encoder and an MLP. Experiments conducted on multiple benchmark EEG datasets show that EED consistently outperforms conventional models, while EED-CL achieves further improvements under label-constrained conditions. Notably, EED-CL demonstrates strong robustness to intersubject variability and noise, enabling stable emotion classification even when labeled samples are scarce. These findings indicate that EED-CL effectively captures multiscale spatiotemporal EEG patterns and offers a scalable and reliable approach for EEG-based emotion recognition.