Abstract
Image forgery localization is critical in defending against the malicious manipulation of image content, and is attracting increasing attention worldwide. In this paper, we propose a Dual-domain Fusion Swin Transformer U-Net (DFST-UNet) for image forgery localization. DFST-UNet is built on a U-shaped encoder-decoder architecture. Swin Transformer blocks are integrated into the U-Net architecture to capture long-range context information and perceive forged regions at different scales. Considering the fact that high-frequency forgery information is an essential clue for forgery localization, a dual-stream encoder is proposed to comprehensively expose forgery clues in both the RGB domain and the frequency domain. A novel high-frequency feature extractor module (HFEM) is designed to extract robust high-frequency features. A hierarchical attention fusion module (HAFM) is designed to effectively fuse the dual-domain features. Extensive experimental results demonstrate the superiority of DFST-UNet over the state-of-the-art methods in the task of image forgery localization.