Audio Deepfake Detection via a Fuzzy Dual-Path Time-Frequency Attention Network

基于模糊双路径时频注意力网络的音频深度伪造检测

阅读:1

Abstract

With the rapid advancement of speech synthesis and voice conversion technologies, audio deepfake techniques have posed serious threats to information security. Existing detection methods often lack robustness when confronted with environmental noise, signal compression, and ambiguous fake features, making it difficult to effectively identify highly concealed fake audio. To address this issue, this paper proposes a Dual-Path Time-Frequency Attention Network (DPTFAN) based on Pythagorean Hesitant Fuzzy Sets (PHFS), which dynamically characterizes the reliability and ambiguity of fake features through uncertainty modeling. It introduces a dual-path attention mechanism in both time and frequency domains to enhance feature representation and discriminative capability. Additionally, a Lightweight Fuzzy Branch Network (LFBN) is designed to achieve explicit enhancement of ambiguous features, improving performance while maintaining computational efficiency. On the ASVspoof 2019 LA dataset, the proposed method achieves an accuracy of 98.94%, and on the FoR (Fake or Real) dataset, it reaches an accuracy of 99.40%, significantly outperforming existing mainstream methods and demonstrating excellent detection performance and robustness.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。