Abstract
Infrared and visible image fusion aims to integrate complementary information from heterogeneous images captured by different optical sensors based on distinct imaging principles; however, existing methods often exhibit modality bias, leading to weakened targets or the loss of crucial texture details. To address this, we propose MBFTFuse, an adversarial fusion network based on modality balancing and feature tracing, which consists of a triple-path generator and dual discriminators. The architecture employs a generator with a triple-path structure: a central modality-balancing path for deep feature fusion and dual edge feature-tracing paths for modality-specific enhancement. Specifically, a multi-cognitive modality-balancing module is introduced to achieve feature weight equilibrium, while a Feature-Tracing Attention Module self-enhances single-modality features to compensate for information loss in the fusion results. Furthermore, a pixel loss based on intensity histograms is designed to optimize inter-modal balance at the pixel level. Comparative experiments against nine state-of-the-art methods across three public datasets demonstrate that MBFTFuse effectively highlights infrared targets while preserving intricate visible textures. The superior performance of this method in both quantitative metrics and downstream object detection tasks contributes to extending the boundaries of sensor-driven computer vision technologies.