FERMam: a lightweight dual-source and multi-scale fusion framework for facial expression recognition

FERMam:一种用于面部表情识别的轻量级双源多尺度融合框架

阅读:1

Abstract

Facial Expression Recognition (FER) demonstrates significant value in practical scenarios such as intelligent human-computer interaction. However, conventional FER methods often struggle to balance performance and efficiency in resource-constrained environments. Specifically, CNN-based methods struggle to capture global dependencies due to their limited receptive fields, while transformer-based methods suffer from quadratic computational complexity caused by self-attention mechanisms. To address these challenges, we propose a lightweight and efficient framework termed FERMam. The proposed model integrates dual-source and multi-scale features through an image fusion encoder, a facial landmark branch, and a pyramid fusion structure. The image fusion encoder combines CNN and Mamba-based selective state-space modeling to capture local structural information and global dependencies, respectively. The facial landmark branch enhances geometry-aware feature representation, and the pyramid fusion structure incorporates an Adaptive State-space Feature Refinement (ASFR) module to facilitate cross-source and cross-scale interactions with minimal computational overhead. Extensive experiments are conducted on three benchmark datasets: RAF-DB, AffectNet, and FERPlus. The results have shown that FERMam uses 62.81M fewer parameters (Param) and 9.73G fewer floating point operations (FLOPs) than POSTER, and 16.7M fewer parameters and 2.43G fewer than POSTER++, while achieving almost the same accuracy on three datasets. These results indicate that FERMam is well-suited for deployment in resource-constrained environments. The code is available at https://github.com/jxcsglr/FERMam .

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。