Transformer-based language-independent gender recognition in noisy audio environments

在嘈杂音频环境中,基于Transformer的与语言无关的性别识别

阅读:3

Abstract

This study proposes an independent method for identifying the gender of the speaker from an audio clip in a noisy environment. In this paper are performed two different processes on audio clips: one as a Mel-Spectrogram and the other using the Wav2Vec2 acoustic model emission, examining the advantages and disadvantages of each method. A series of experiments are presented across five different languages-English, Arabic, Spanish, French, and Russian-containing male and female audio clips. An analysis of these languages is carried out, examining their independent characteristics against a five-language model. The goal of this study is to distinguish the gender of the speaker based on an audio clip, regardless of language or complex background noise such as nightclubs or stadiums. Additionally, this research addresses the critical issue of gender bias in voice recognition systems. It highlights the challenges posed by the over-representation of male voices in training datasets and the subsequent impact on the accuracy and fairness of gender classification, particularly for female voices. The approach in this paper involves maintaining an equivalent quantity of audio clips for both male and female voices to ensure balance and mitigate this bias. The experimental results indicate that the performance evaluation of the traditional spectrogram method achieved better results compared to the Wav2Vec transformer method. For the Russian language, the spectrogram method achieved an accuracy of 99%, while the Wav2Vec transformer(1) method achieved only 89% accuracy. Tests in various environments-noisy and silent-show that a model trained in both conditions exhibited better accuracy. The results also indicate that a model trained on data from a wide variety of languages yielded higher results. The research findings highlight important insights for developing more reliable, accurate, and equitable systems in acoustic gender detection.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。