Abstract
Single image super-resolution (SISR) is a classical computer vision task that aims to reconstruct a high-resolution image from a low-resolution input, thereby improving detail sharpness and visual quality. In recent years, convolutional neural network (CNN)-based methods and transformer-based methods using self-attention mechanisms have achieved significant progress in visible-image super-resolution. However, the direct application of these two types of methods to infrared images still poses considerable challenges. On the one hand, infrared images generally suffer from low signal-to-noise ratio, blurred edges, and missing details, and relying only on local convolutions makes it difficult to adequately model long-range dependencies across regions. On the other hand, although pure transformer models have a strong global modeling ability, they usually have large numbers of parameters and are sensitive to the amount of training data, making it difficult to balance efficiency and detail restoration in infrared imaging scenarios. To address these issues, we propose a hybrid neural network architecture for infrared image super-resolution reconstruction, termed RDSR (Residual Dual-branch Separable Super-Resolution Network), which organically integrates multi-scale depthwise separable convolutions with shifted-window self-attention. Specifically, we design a dual-branch spatial interaction module (BDSI, Dual-Branch Spatial Interaction) and a multi-scale separable spatial aggregation module (MSSA, Multi-Scale Separable Spatial Aggregation). The BDSI module models correlations along rows and columns through grouped convolutions in the horizontal and vertical directions, effectively strengthening the spatial information interaction between the convolution branch and the self-attention branch. The MSSA module replaces the conventional MLP with three parallel depthwise separable convolution branches, improving the feature representation and nonlinear modeling through multi-scale spatial aggregation and a star-shaped gating operation. The experimental results on multiple public infrared image datasets show that for ×2 and ×4 upscaling, the proposed RDSR achieves higher PSNR and SSIM values than CNN-based methods such as EDSR, RCAN, and RDN, as well as transformer-based methods such as SwinIR, DAT, and HAT, demonstrating the effectiveness of the proposed modules and the overall framework.