Abstract
INTRODUCTION: Being able to recognize the emotions in others is fundamental to social interaction, yet the precise temporal dynamics by which the brain integrates contextual cues with facial expressions remain unclear. This study used behavioral measures and event-related potentials (ERPs) to investigate how contextual congruency and emotional valence modulate facial emotion recognition in a neurotypical population. METHODS: Participants viewed emotional faces preceded by either congruent or incongruent bimodal cues, combining vocalizations and visual images. RESULTS: Behaviorally, participants responded faster and made fewer errors during congruent trials than in incongruent trials, indicating that context facilitates emotional processing. At the neural level, incongruent cues elicited a significantly larger P1 component, suggesting that the brain allocates increased early attentional resources to conflicting stimuli. Furthermore, the P3 component was significantly larger for negative stimuli compared to neutral ones, highlighting the role of emotional valence in later stages of cognitive processing. DISCUSSION: Together, these findings support a multi-stage model of emotional integration, where contextual incongruency impacts processing from early perceptual encoding to later cognitive evaluation. By integrating behavioral and neural evidence, this study clarifies the temporal course of contextual integration in multisensory emotion perception and provides new insights with implications for clinical and applied research.