Abstract
Biometric authentication using human physiological and behavioral characteristics has been widely adopted, with speaker verification attracting attention due to its convenience and noncontact nature. Conventional speaker verification systems remain vulnerable to spoofing attacks, however, often requiring integration with separate spoofed speech detection models. In this work, the authors propose an emotion-dependent speaker verification system that integrates speaker characteristics with emotional speech characteristics, enhancing robustness against spoofed speech without relying on additional classification models. By comparing acoustic characteristics of emotions between registered and verification speech using pretrained models, the proposed method reduces the equal error rate compared to conventional speaker verification systems, achieving an average equal error rate of 1.13% for speaker verification and 17.7% for the anti-spoofing task. Researchers additionally conducted a user evaluation experiment to assess the usability of emotion-dependent speaker verification. The results indicate that although emotion-dependent authentication was initially cognitively stressful, participants adapted over time, and the burden was significantly reduced after three sessions. Among the tested emotions (anger, joy, sadness, and neutral), sadness proved most effective, with stable scores, a low error rate, and minimal user strain. These findings suggest that neutral speech is not always the optimal choice for speaker verification and that well-designed emotion-dependent authentication can offer a practical and robust security solution.