Abstract
Episodic memory retrieval engages both sensory reinstatement and internally transformed representations. Due to modality-specific processing, auditory and visual memories may differ in their reliance on these mechanisms. We used functional magnetic resonance imaging and multivoxel pattern analyses to examine how 25 participants (12 males and 13 females) encoded and retrieved naturalistic sounds and videos. Both auditory and visual targets reinstated event-specific fine activation patterns in the association cortex during retrieval, and reinstatement strength correlated with subjective memory vividness. However, after removing encoding traces, auditory episodes showed a markedly larger reliance on internally transformed traces than visual episodes, quantified by "reinstatement-free" retrieval-retrieval similarity. Sensory reinstatement correlated more with the (detail-related) posterior hippocampus, while internal representations also correlated with the (gist-related) anterior hippocampus. Furthermore, temporal voice areas preserved gist-level (human vs nonhuman) information from encoding to retrieval, whereas fusiform face representations degraded. These findings reveal that auditory and visual memories share a common sensory reinstatement mechanism but differ in the neural mechanism that supports retrieval, with participants favoring gist over perceptual details during auditory memory retrieval.