Abstract
Facial Expression Recognition (FER) is a research topic of great practical significance. However, existing FER methods still face numerous challenges, particularly in the interaction between spatial and global information, the distinction of subtle expression features, and the attention to key facial regions. This paper proposes a lightweight Global-Aware Spatial (GAS) Attention module, designed to improve the accuracy and robustness of FER. This module extracts global semantic information from the image via global average pooling and fuses it with local spatial features extracted by convolution, guiding the model to focus on regions highly relevant to facial expressions (such as the mouth and eyes). This effectively suppresses background noise and enhances the model's ability to perceive subtle expression variations. In addition, we further introduce a Squeeze-and-Excitation (SE) Attention module into the dual-branch architecture to adaptively adjust the channel-wise weights of features, emphasizing critical region information and enhancing the model's discriminative capacity. Based on these improvements, we develop the Ada-DF++ network model. Experimental results show that the improved model achieves test accuracies of 89.21%, 66.14%, and 63.75% on the RAF-DB, AffectNet (7cls), and AffectNet (8cls) datasets, respectively, outperforming current state-of-the-art methods across multiple benchmarks and demonstrating the effectiveness of the proposed approach for FER tasks.