Abstract
Background/Objectives: Decoding affective states from Electroencephalography (EEG) signals is fundamental to non-invasive Brain-Computer Interfaces. Despite recent advances, accurate recognition is impeded by the inherently non-stationary nature of physiological signals and the entanglement of spatio-temporal dynamics within high-dimensional recordings. While Transformers excel at global modeling, they often neglect the continuous dynamical properties of neural signals and suffer from quadratic complexity. Methods: In this paper, we propose the Spatio-Temporal Hybrid Mamba-Attention (STHMA), a framework designed to explicitly disentangle and model EEG dynamics via linear-complexity State Space Models. First, to incorporate domain knowledge, we introduce a Dual-Domain Physics-Aware Embedding module. This module fuses learnable temporal convolutions with explicit frequency-domain spectral features, ensuring fidelity to neurophysiological principles. Second, we propose a novel Decoupled Spatial-Temporal Scanning strategy. By dynamically reconfiguring the serialization of the data tensor, our model strictly separates the learning of instantaneous functional connectivity from the tracking of emotional state evolution, thereby preventing the structural collapse common in 1D sequence models. Results: Extensive experiments on the FACED and SEED-V datasets demonstrate that the STHMA achieves state-of-the-art performance, significantly exceeding the random chance baselines (11.11% for 9-class FACED and 20.00% for 5-class SEED-V). Conclusions: The results validate that combining Physics-Aware Embeddings with decoupled state-space modeling offers a scalable and effective paradigm for EEG emotion recognition.