Abstract
The majority of deep learning methods for detecting image forgery fail to accurately detect and localize the tampering operations. Furthermore, they only support a single image tampering type. Our method introduces three key innovations: (1) A spatial perception module that combines the spatial rich model (SRM) with constrained convolution, enabling focused detection of tampering traces while suppressing interference from image content; (2) A hierarchical feature learning architecture that integrates Swin Transformer with UperNet for effective multi-scale tampering pattern recognition; and (3) A comprehensive optimization strategy including auxiliary supervision, self-supervised learning, and hard example mining, which significantly improves model convergence and detection accuracy. Comprehensive experiments are performed on two established datasets; namely MixTamper and DocTamper with 19,600 and 170,000 images, respectively. The experimental findings demonstrate that the proposed model enhances the IoU index by 13% compared to the leading algorithms. Additionally, it can accurately detect multiple tampering types from a single image.