Research on Acoustic Scene Classification Based on Time-Frequency-Wavelet Fusion Network

基于时频小波融合网络的声学场景分类研究

阅读:1

Abstract

Acoustic scene classification aims to recognize the scenes corresponding to sound signals in the environment, but audio differences from different cities and devices can affect the model's accuracy. In this paper, a time-frequency-wavelet fusion network is proposed to improve model performance by focusing on three dimensions: the time dimension of the spectrogram, the frequency dimension, and the high- and low-frequency information extracted by a wavelet transform through a time-frequency-wavelet module. Multidimensional information was fused through the gated temporal-spatial attention unit, and the visual state space module was introduced to enhance the contextual modeling capability of audio sequences. In addition, Kolmogorov-Arnold network layers were used in place of multilayer perceptrons in the classifier part. The experimental results show that the proposed method achieves a 56.16% average accuracy on the TAU Urban Acoustic Scenes 2022 mobile development dataset, which is an improvement of 6.53% compared to the official baseline system. This performance improvement demonstrates the effectiveness of the model in complex scenarios. In addition, the accuracy of the proposed method on the UrbanSound8K dataset reached 97.60%, which is significantly better than the existing methods, further verifying the generalization ability of the proposed model in the acoustic scene classification task.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。