STFANet: A spatial and temporal feature aggregation network for fake face detection in videos

STFANet:一种用于视频中伪脸检测的空间和时间特征聚合网络

阅读:1

Abstract

The verification of video authenticity has become progressively more challenging with the rapid advancements in video synthesis technologies. However, current detection approaches predominantly depend on intra-frame spatial artifacts and temporal inconsistencies, restricting their capacity to fully exploit the spatio-temporal characteristics of manipulated videos. To address this problem, we propose the Spatial and Temporal Feature Aggregation Network (STFANet), which employs a two-path structure to extract spatial and temporal features independently. These extracted features are subsequently integrated to construct high-fidelity spatio-temporal representations. Additionally, we incorporate a Vision Transformer module to capture global dependencies within the feature maps, enhancing the overall feature representation. Extensive experiments validate the efficacy of the proposed approach in detecting facial forgery in videos. Performance evaluations on benchmark datasets, including FaceForensics++ and Celeb-DF, confirm the effectiveness of our method, yielding AUC scores of 0.9933 and 0.9829, respectively. Furthermore, we investigate the impact of feature aggregation at different stages on the generated feature maps, revealing significant improvements in the quality of spatio-temporal representations.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。