Abstract
The field of micro-expression recognition (MER) has garnered considerable attention for its potential to reveal an individual's genuine emotional state. However, MER remains a formidable challenge, primarily due to the subtle nature and brief duration of micro-expressions. Many approaches typically rely on optical flow to capture motion between video frames. However, these methods exhibit limited variability in expression intensity across frames, which may not be effective for all individuals due to significant differences in their micro-expressions. To address this issue, we propose a novel framework called the Action Amplification Representation and Transformer Network (ARTNet) to adjust the motion amplitude, making it easier to recognize each individual's micro-expressions. Firstly, we amplify the motion discrepancies between frames to enhance expression intensity. Subsequently, we calculate the optical flow of these amplified frames to depict micro-expressions more prominently. Finally, we use transformer layers to capture the relationships between different amplification features. Extensive experiments conducted on three diverse datasets confirm the efficacy of our proposed method.