A music source separation method integrating time-frequency decoupling and mamba-based state space modeling

一种结合时频解耦和基于Mamba的状态空间建模的音乐源分离方法

阅读:1

Abstract

Music source separation, as a fundamental task in intelligent audio processing, plays a critical role in enhancing the performance of music generation, editing, and understanding systems. However, existing separation models often suffer from structural limitations such as reliance on a single modeling path, entangled time-frequency representations, and difficulty in adapting to heterogeneous sound sources. Furthermore, they struggle to maintain an effective balance between long-range dependency modeling and inference efficiency. To address these challenges, this paper proposes a novel dual-path state space modeling architecture, MSNet. By introducing decoupled modeling mechanisms for temporal and frequency pathways, and incorporating Mamba-based state space units for multidimensional structural parsing of audio signals, MSNet enhances selective control and structural representation in time-frequency modeling. Experimental results demonstrate that MSNet achieves state-of-the-art performance on the MUSDB18 dataset across multiple evaluation metrics. In particular, it shows superior robustness and stability when dealing with dynamically complex sources such as vocals and drums. Additionally, the model achieves a real-time factor (RTF) below 0.1 while maintaining superior separation quality, making it suitable for deployment in practical applications. This study not only demonstrates the feasibility of state space models for complex audio modeling but also introduces a new architectural paradigm for music source separation that balances accuracy and efficiency. The implementation is publicly available at: https://github.com/NMLAB8/Mamba-S-Net.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。