Abstract
Underwater computer vision faces significant challenges from light scattering, absorption, and poor illumination, which severely impact underwater vision tasks. To address these issues, ViT-Clarity, an underwater image enhancement module, is introduced, which integrates vision transformers with a convolutional neural network for superior performance. For comparison, ClarityNet, a transformer-free variant of the architecture, is presented to highlight the transformer's impact. Given the limited availability of paired underwater image datasets (clear and degraded), BlueStyleGAN is proposed as a generative model to create synthetic underwater images from clear in-air images by simulating realistic attenuation effects. BlueStyleGAN is evaluated against existing state-of-the-art synthetic dataset generators in terms of training stability and realism. Vit-ClarityNet is rigorously tested on five datasets representing diverse underwater conditions and compared with recent state-of-the-art methods as well as ClarityNet. Evaluations include qualitative and quantitative metrics such as UCIQM, UCIQE, and the deep learning-based URanker. Additionally, the impact of enhanced images on object detection and SIFT feature matching is assessed, demonstrating the practical benefits of image enhancement for underwater computer vision tasks.