Abstract
As AI-generated images become increasingly photorealistic, distinguishing them from natural images poses a growing challenge. This paper presents a robust detection framework that leverages multiple uncertainty measures to decide whether to trust or reject a model's predictions. We focus on three complementary techniques: Fisher Information, which captures the sensitivity of model parameters to input variations; entropy-based uncertainty from Monte Carlo (MC) Dropout, which reflects predictive variability; and predictive variance from a Deep Kernel Learning (DKL) framework using a Gaussian Process (GP) classifier. To integrate these diverse uncertainty signals, we employ Particle Swarm Optimisation (PSO) to learn optimal weightings and determine an adaptive rejection threshold. The model is trained on Stable Diffusion-generated images and evaluated on GLIDE, VQDM, Midjourney, BigGAN and StyleGAN3 each presenting significant distribution shifts. While standard metrics like prediction probability and Fisher-based measures perform well in distribution, they degrade under shift. In contrast, the Combined Uncertainty measure consistently achieves an incorrect rejection rate of approximately 70% on unseen generators, successfully filtering out most misclassified AI samples. Although the system occasionally rejects correct predictions from newer generators, this conservative behaviour remains acceptable, as rejected synthetic samples serve as valuable input for retraining. Crucially, it maintains high acceptance of accurate predictions for natural images and in-domain AI data. Under adversarial attacks (FGSM and PGD), the Combined Uncertainty method rejects around 61% of successful attacks, while the GP-based uncertainty alone achieves up to 80%. Notably, the Combined approach maintains strong selectivity, rarely rejecting correct predictions. Overall, our findings highlight the benefit of multi-source uncertainty fusion for resilient and adaptive AI-generated image detection.