Abstract
Deep learning-based infrared and visible image fusion methods still encounter major challenges in preserving high-level semantic information and comprehensive feature representations. To address this problem, we propose a novel fusion framework that integrates visual enhancement with semantic coupling for infrared and visible image fusion. Specifically, this work introduces a Visible Image Adaptive Enhancement Module (IECA), which adaptively enhances visible images by improving contrast and preserving scene details. The architecture leverages a Dual-Path Feature Fusion Network (DPFFN) to extract both local structural cues and global contextual information. To further refine feature representation, the framework integrates a Channel-Spatial Aggregation Unit (CSAU) that highlights critical spatial regions and channel-specific features. The method also enables semantic coupling through a Bimodal Interactive Attention (BIA) mechanism. In addition, a discriminator is incorporated to guide the network toward generating fused results with enhanced contrast and improved semantic consistency. Experiments conducted on the MSRS, RoadScene, and TNO datasets demonstrate that, compared with ten existing methods, the proposed fusion approach achieves better performance in both qualitative and quantitative results. The object detection experiments further indicate that our method performs well in high-level vision tasks.