Reconstructing music perception from brain activity using a prior guided diffusion model

利用先验引导扩散模型从大脑活动中重建音乐感知

阅读:1

Abstract

Reconstructing music directly from brain activity provides insight into the neural representations underlying auditory processing and paves the way for future brain-computer interfaces. We introduce a fully data-driven pipeline that combines cross-subject functional alignment with bayesian decoding in the latent space of a diffusion-based audio generator. Functional alignment projects individual fMRI responses onto a shared representational manifold, increasing the performance of cross-participant accuracy with respect to anatomically normalized baselines. A bayesian search over latent trajectories then selects the most plausible waveform candidate, stabilizing reconstructions against neural noise. Crucially, we bridge CLAP's multi-modal embeddings to music-domain latents through a dedicated aligner, eliminating the need for hand-crafted captions and preserving the intrinsic structure of musical features. Evaluated on ten diverse genres, the model achieves a cross-subject-averaged identification accuracy of [Formula: see text], and produces audio that human listeners recognize above chance in 85.7% of trials. Voxel-wise analyses locate the predictive signal within a bilateral circuit spanning early auditory, inferior-frontal, and premotor cortices, consistent with hierarchical and sensorimotor theories of music perception. The framework establishes a principled bridge between generative audio models and cognitive neuroscience.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。