Abstract
EEG signals are widely used in emotion recognition due to their capability for objective emotional state quantification. However, despite containing abundant frequency and spatial information, researchers continue to face challenges in extracting fine-grained discriminative features from these signals. We develop SC-SDT (Spectral Convolution-Spatial Differential Transformer), a novel framework that jointly models spectral and spatial characteristics through an integrated convolutional and transformer architecture. First the model is equipped with a Spectral Feature Embedding module that employs a sequential group-pointwise convolutional network. This enables the dynamic capture of both local spectral patterns within bands and global interactions across the frequency spectrum. Subsequently, a Spatial Feature Extraction module is designed to simultaneously mitigate attention noise and optimize functional connectivity mapping across EEG channels through its core differential attention mechanism. Finally, to enhance model robustness against inter-subject variability, we introduce supervised contrastive loss that explicitly enforces subject-invariant feature representations while preserving class discriminability. Employing a subject-independent experimental paradigm, we rigorously evaluated the proposed SC-SDT model on SEED, SEED-IV, and DEAP datasets to assess cross-subject generalization capabilities. Experimental results demonstrate that SC-SDT achieves competitive emotion classification performance by effectively modeling spectral-spatial neural signatures. Our analysis of its key components further reveals that the model not only pioneers the application of differential attention in EEG, but also offers a methodological foundation for efficient spectral-spatial feature extraction. The code for this paper is accessible at https://github.com/apolloCoder-byte/SC-SDT.