Abstract
Infants preferentially process familiar social signals, but the neural mechanisms underlying continuous processing of maternal speech remain unclear. Using EEG-based neural encoding models based on temporal response functions, we investigated how 7-month-old human infants track maternal versus unfamiliar speech and whether this affects simultaneous face processing. Infants (13 boys, 12 girls) showed stronger neural tracking of their mother's voice, independent of acoustic properties, suggesting an early neural signature of voice familiarity. Furthermore, central encoding of unfamiliar faces was diminished when infants heard their mother's voice and face tracking accuracy at central electrodes increased with earlier occipital face tracking, suggesting heightened attentional engagement. However, we found no evidence for differential processing of happy versus fearful faces, contrasting previous findings on early emotion discrimination. Our results reveal interactive effects of voice familiarity on multimodal processing in infancy: while maternal speech enhances neural tracking, it may also alter how other social cues, such as faces, are processed. The findings suggest that early auditory experiences shape how infants allocate cognitive resources to social stimuli, emphasizing the need to consider cross-modal influences in early development.