Abstract
Image fusion is a pivotal technology that effectively integrates multimodal image information to obtain clearer imaging, and it has wide applications in fields such as environmental monitoring, reconnaissance, and night vision. However, the majority of extant fusion methods neglect the issue of image degradation caused by inclement weather conditions in real-world scenarios. This results in a deficiency of clarity and detail representation of fused images in complex environments. The proposed method is an adaptive multimodal image fusion technique that is suitable for extreme scenarios, and it solves the imaging problem when the scene is affected by degraded interference. Firstly, a pre-enhancement module based on physical parameters is utilised to adaptively enhance the degraded image. The primary objective is to execute preliminary filtration of deleterious interference in the input degraded image. Subsequently, a gate-based sparse expert mixing mechanism was introduced, guided by degraded text descriptions generated by large visual-language models. This method facilitates the establishment of a dynamically sparse network structure, thereby enabling the overall model to manage complex and diverse input degradation information with greater flexibility. Finally, in order to enhance fusion performance to an even greater extent, a composite loss function has been devised. This function incorporates pixel-level loss, gradient loss, reconstruction loss and mutual information loss, thereby effectively improving the modal discrimination and detail retention ability of the fused image. The experimental results demonstrate that the proposed method significantly outperforms mainstream methods on multiple public datasets and in degraded scenarios such as smog, low light, and overexposure, demonstrating superior performance in terms of image clarity and quantitative metrics.