Abstract
With ongoing growth in the implementation of CCTV networks, miniature sensors, and IoT devices, the quality of captured images in terms of authenticity has become a major security issue. Through advanced editing tools and generative models, the capability now exists to perform highly advanced forgeries that fail both human perception and traditional algorithms, and especially in terms of sensor-generated content. State-of-the-art algorithms typically use a single-cue characteristic in their models to stabilize performance, including local noise statistics or structural disruption patterns, making them susceptible to varied forms of manipulation. As a solution to this issue, we have developed MultiFusion, a new forgery detection framework which combines complementary forensic cues in images: SRM-based noise residuals, hierarchical texture features based on EfficientNet-B0, and global structural relationships from a vision transformer. A special DnCNN denoising preprocessing layer represses sensor noise and maintains fine traces of tampering. To achieve better interpretability, we combine Grad-cam images of the convolutional flow and transformer attention maps to create on-unit interpretable heatmaps, the areas of which identify regions of manipulation. Experimental verification on the CASIA 2.0 standard shows high detection accuracy (96.69) and good generalization. Via normalized denoising, multimodal feature fusion, and explainable AI, our framework takes CCTV, sensor forensics, and IoT image authentication to the next level.