Abstract
In recent years, the rapid advancement of deep learning techniques has significantly propelled the development of face forgery methods, drawing considerable attention to face forgery detection. However, existing detection methods still struggle with generalization across different datasets and forgery techniques. In this work, we address this challenge by leveraging both local texture cues and global frequency domain information in a complementary manner to enhance the robustness of face forgery detection. Specifically, we introduce a local texture mining and enhancement module. The input image is segmented into patches and a subset is strategically masked, then texture enhanced. This joint masking and enhancement strategy forces the model to focus on generalizable localized texture traces, mitigates overfitting to specific identity features and enabling the model to capture more meaningful subtle traces of forgery. Additionally, we extract multi-scale frequency domain features from the face image using wavelet transform, thereby preserving various frequency domain characteristics of the image. And we propose an innovative frequency-domain processing strategy to adjust the contributions of different frequency-domain components through frequency-domain selection and dynamic weighting. This Facilitates the model's ability to uncover frequency-domain inconsistencies across various global frequency layers. Furthermore, we propose an integrated framework that combines these two feature modalities, enhanced with spatial attention and channel attention mechanisms, to foster a synergistic effect. Extensive experiments conducted on several benchmark datasets demonstrate that the proposed technique demonstrates superior performance and generalization capabilities compared to existing methods.