Abstract
In general, high-fidelity remote sensing requires both synthetic aperture radar (SAR) images, which are available all-day all-weather but could be challenging to interpret, and optical images, which are human-interpretable but are only available in favorable light conditions. Two of the most widely-adopted strategies for combining the complementary information regarding the area of interest revealed in SAR and electro-optical (EO) images are Image Fusion (IF) and Image Translation (IT). IF aims to merge two or more multimodal images into one image, while IT emphasizes on translating the data representations from the images in the source domain to the target domain. Existing methods typically focus on either IF or IT. In this paper, we jointly exploit IF and IT for enhanced semantic segmentation. When the EO image is of high quality, SAR-optical IF is carried out based on NonSubsampled Contourlet Transform and intensity-hue-saturation. When the EO images suffer from heavy noise due to fog/smoke/clouds and SAR images become the last resort, an efficient end-to-end SAR-to-optical IT network based on the diffusion model is adopted. Experimental results show that the proposed DeepLab+IFIT strategy offers an average accuracy (aAcc) of 94.86% and a mean intersection-over-union (mIoU) of 87.11% on the SpaceNet6 dataset, while achieving an aAcc of 95.96% and a mIoU of 80.49% on AIR-MD-SAR-Map dataset, which outperforms several classic semantic segmentation networks.