Abstract
Class activation mapping (CAM) is key to understanding how convolutional neural networks (CNNs) make decisions, but current approaches face considerable challenges. First-order gradient-based methods are often affected by noise and are prone to gradient saturation, leading to less accurate localization. These methods also tend to rely on manual selection and merging of feature maps, limiting their ability to leverage complementary information across network layers and resulting in weaker visual explanations. To address these issues, we propose a smooth second-order gradient class activation mapping (SSG-CAM) method. By incorporating second-order gradients, SSG-CAM captures changes in feature importance to alleviate gradient saturation and integrates a smoothing technique to reduce noise. Additionally, SSG-CAM is integrated with the differential evolution (DE) algorithm to create a collaborative DE-SSG-CAM optimization framework, which automatically screens and fuses the optimal combination of multi-level feature maps. Extensive experiments on multiple benchmark tasks, including weakly supervised object localization and semantic segmentation, demonstrate that our method outperforms existing baselines across various metrics. Notably, the DE-SSG-CAM framework demonstrated a mean Intersection over Union (mIoU) of 62.38% in the complex task of localizing malarial parasite lesions in red blood cells, highlighting its exceptional performance in biomedical image analysis. In this study, we present an accurate and robust visual explanation tool, offering an innovative approach for automatically distilling optimal visual interpretations from deep neural networks.