Utterance-Style-Dependent Speaker Verification Using Emotional Embedding with Pretrained Models

基于预训练模型和情感嵌入的语篇风格相关说话人验证

阅读:1

Abstract

Biometric authentication using human physiological and behavioral characteristics has been widely adopted, with speaker verification attracting attention due to its convenience and noncontact nature. Conventional speaker verification systems remain vulnerable to spoofing attacks, however, often requiring integration with separate spoofed speech detection models. In this work, the authors propose an emotion-dependent speaker verification system that integrates speaker characteristics with emotional speech characteristics, enhancing robustness against spoofed speech without relying on additional classification models. By comparing acoustic characteristics of emotions between registered and verification speech using pretrained models, the proposed method reduces the equal error rate compared to conventional speaker verification systems, achieving an average equal error rate of 1.13% for speaker verification and 17.7% for the anti-spoofing task. Researchers additionally conducted a user evaluation experiment to assess the usability of emotion-dependent speaker verification. The results indicate that although emotion-dependent authentication was initially cognitively stressful, participants adapted over time, and the burden was significantly reduced after three sessions. Among the tested emotions (anger, joy, sadness, and neutral), sadness proved most effective, with stable scores, a low error rate, and minimal user strain. These findings suggest that neutral speech is not always the optimal choice for speaker verification and that well-designed emotion-dependent authentication can offer a practical and robust security solution.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。