Abstract
Micro-expressions are extremely subtle and short-lived facial muscle movements that often reveal an individual's genuine emotions. However, micro-expression recognition (MER) remains highly challenging due to its short duration, low motion intensity, and the imbalanced distribution of training samples. To address these issues, this paper proposes a Global-Local Feature Fusion Network (GLFNet) to effectively extract discriminative features for MER. Specifically, GLFNet consists of three core modules: the Global Attention (LA) module, which captures subtle variations across the entire facial region; the Local Block (GB) module, which partitions the feature map into four non-overlapping regions to emphasize salient local movements while suppressing irrelevant information; and the Adaptive Feature Fusion (AFF) module, which employs an attention mechanism to dynamically adjust channel-wise weights for efficient global-local feature integration. In addition, a class-balanced loss function is introduced to replace the conventional cross-entropy loss, mitigating the common issue of class imbalance in micro-expression datasets. Extensive experiments are conducted on three benchmark databases, SMIC, CASME II, and SAMM, under two evaluation protocols. The experimental results demonstrate that under the Composite Database Evaluation protocol, GLFNet consistently outperforms existing state-of-the-art methods in overall performance. Specifically, the unweighted F1-scores on the Combined, SAMM, CASME II, and SMIC datasets are improved by 2.49%, 2.02%, 0.49%, and 4.67%, respectively, compared to the current best methods. These results strongly validate the effectiveness and superiority of the proposed global-local feature fusion strategy in micro-expression recognition tasks.