Abstract
With the development of cross-modal image fusion in multi-sensor systems, current fusion technologies have made significant progress in feature extraction, facilitating more effective image analysis. However, insufficient fusion information may degrade the correlation between the source and fused images, often resulting in the omission of critical features from the original modalities. Therefore, in order to preserve as much information as possible, especially for the complete extraction of effective feature information in source images, this paper proposes a new cross-modal image fusion method based on low-rank representation and convolutional sparse learning named LRCFuse. Firstly, the learned low-rank representation (LLRR) blocks are employed to perform dimensionality reduction on the source images while simultaneously extracting their low-rank and sparse feature components. Nevertheless, considering that the low-rank representation has insufficient modeling ability for different modal images, we introduce common feature preservation module (CFPM) blocks based on convolutional sparse coding. By leveraging the CFPM module, LRCFuse recovers common features from both source images to mitigate the loss caused by the imperfect assumptions of low-rank representation. Based on this, a multi-level optimization strategy incorporating pixel loss, shallow-level loss, mid-level loss, deep-level loss, and sobel loss is proposed to hierarchically learn and refine diverse image features. Quantitative and qualitative evaluations are conducted across various datasets, revealing that LRCFuse can effectively detect targets infrared salient targets, preserve additional details in visible images, and achieve better fusion results for subsequent downstream tasks.