Abstract
Facial expression recognition (FER) has been challenged by insufficient sensitivity to fine-grained local variations and limited feature representation capabilities in existing methods. To address these issues, we propose a novel FER method MGFA based on multi-granularity feature fusion with attention mechanism. MGFA enhances facial expression feature representation by combining both global and local multi-scale features. In the Global Multi-scale Feature Extraction Module (GMFEM), different multi-scale sampling methods and channel attention mechanisms are employed to extract and enhance the global feature information. In the Local Multi-granularity Feature Extraction Module (LMFEM), spatial segmentation is first applied to the facial image to capture local information at different granularities. Subsequently, the Multi-scale Lightweight Spatial Attention Module is introduced to enhance attention to local key features. Furthermore, the Cross-Fusion Module (CFM) simulates the relationship between feature information extracted from regions with different granularities, improving the model's ability to capture both local and global details of facial expressions. Experimental results on three widely-used public datasets demonstrate that the proposed method significantly enhances the accuracy and robustness of FER.