SeqFusionNet: A hybrid model for sequence-aware and globally integrated acoustic representation

SeqFusionNet:一种用于序列感知和全局整合声学表征的混合模型

阅读:1

Abstract

Animals communicate information primarily via their calls, and directly using their vocalizations proves essential for executing species conservation and tracking biodiversity. Conventional visual approaches are frequently limited by distance and surroundings, while call-based monitoring concentrates solely on the animals themselves, proving more effective and straightforward than visual techniques. This paper introduces an animal sound classification model named SeqFusionNet, integrating the sequential encoding of Transformer with the global perception of MLP to achieve robust global feature extraction. Research involved compiling and organizing four common acoustic datasets (pig, bird, urbansound, and marine mammal), with extensive experiments exploring the applicability of vocal features across species and the model's recognition capabilities. Experimental results validate SeqFusionNet's efficacy in classifying animal calls: it identifies four pig call types at 95.00% accuracy, nine and six bird categories at 94.52% and 95.24% respectively, fifteen and eleven marine mammal types reaching 96.43% and 97.50% accuracy, while attaining 94.39% accuracy on ten urban sound categories. Comparative analysis shows our method surpasses existing approaches. Beyond matching reference models on UrbanSound8K, SeqFusionNet demonstrates strong robustness and generalization across species. This research offers an expandable, efficient framework for automated bioacoustic monitoring, supporting wildlife preservation, ecological studies, and environmental sound analysis applications.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。