Abstract
Dance is an ancient, holistic art form practiced worldwide throughout human history. Although it offers a window into cognition, emotion, and cross‑modal processing, fine‑grained quantitative accounts of how its diverse information is represented in the brain have rarely been performed. Here, we relate features from a cross‑modal deep generative model of dance to functional magnetic resonance imaging responses while participants watched naturalistic dance clips. We demonstrate that cross-modal features explain dance‑evoked brain activity better than low‑level motion and audio features. Using encoding models as in silico simulators, we quantify how dances that elicit different emotions yield distinct neural patterns. While expert dancers' brain activity is more broadly explained by dance features than that of novices, experts exhibit greater individual variability. Our approach links cross-modal representations from generative models to naturalistic neuroimaging, clarifying how motion, music, and expertise jointly shape aesthetic and emotional experience.