Abstract
The deployment of deep learning models in Internet of Things (IoT) systems is increasingly threatened by adversarial attacks. To address the challenge of effectively detecting adversarial examples generated by Generative Adversarial Networks (AdvGANs), this paper proposes an unsupervised detection method that integrates spatial statistical features and multidimensional distribution characteristics. First, a collection of adversarial examples under four different attack intensities was constructed on the CIFAR-10 dataset. Then, based on the VGG16 and ResNet50 classification models, a dual-module collaborative architecture was designed: Module A extracted spatial statistics from convolutional layers and constructed category prototypes to calculate similarity, while Module B extracted multidimensional statistical features and characterized distribution anomalies using the Mahalanobis distance. Experimental results showed that the proposed method achieved a maximum AUROC of 0.9937 for detecting AdvGAN attacks on ResNet50 and 0.9753 on VGG16. Furthermore, it achieved AUROC scores exceeding 0.95 against traditional attacks such as FGSM and PGD, demonstrating its cross-attack generalization capability. Cross-dataset evaluation on Fashion-MNIST confirms its robust generalization across data domains. This study presents an effective solution for unsupervised adversarial example detection, without requiring adversarial samples for training, making it suitable for a wide range of attack scenarios. These findings highlight the potential of the proposed method for enhancing the robustness of IoT systems in security-critical applications.