Abstract
Oil spills pose a severe threat to marine and coastal environments, requiring accurate and timely detection to reduce ecological and economic damage. Synthetic Aperture Radar (SAR) is widely used for marine monitoring due to its ability to capture ocean surface features under all-weather and day–night conditions. However, speckle noise and look-alike phenomena in SAR imagery significantly hinder reliable spill identification. To address these challenges, this study introduces an explainable deep learning framework comprising three quantitatively defined components that work together to improve detection accuracy. First, a denoising autoencoder with two convolutional layers (16 and 32 filters) and two transposed convolution layers is used to suppress SAR-specific speckle noise, improving downstream feature clarity and enhancing segmentation accuracy by stabilizing texture representation. Second, a U-Net + + segmentation network with nested skip connections and three encoder–decoder stages is employed to localize potential spill regions, providing structured spatial priors that guide the classifier toward more discriminative regions. Third, the ViR-SC ensemble classifier, which integrates five independently trained models—CNN, ResNet18, Vision Transformer, Support Vector Machine, and Random Forest—aggregates local, hierarchical, and global feature cues to improve classification robustness. The ensemble voting mechanism strengthens sensitivity to subtle slick structures while reducing errors arising from individual model biases. To ensure interpretability, Grad-CAM highlights class-discriminative spatial regions for CNN-based models, while SHAP quantifies feature importance for classical machine learning components. Experiments were conducted on a publicly available Sentinel-1 SAR dataset containing 5,630 labeled image patches (1905 oil, 3725 non-oil). Among single models, the Vision Transformer achieved 98.00% accuracy, whereas the proposed ViR-SC ensemble improved performance to 98.45%, demonstrating measurable gains from component integration. Explainability results further confirm that model decisions correspond to actual oil spill structures in the imagery.