Abstract
Dual-branch networks serve a crucial role in real-time semantic segmentation. During feature extraction, sequential downsampling frequently results in the loss of fine details, while existing methods often underutilize contextual information. Traditional spatial domain fusion approaches cannot fully integrate local and global information, limiting the network's expressive capability. To address these challenges, a Context-Guided Detail Fusion Network (CGDFNet) is developed based on existing dual-branch frameworks to enhance feature representation while preserving image details. Specifically, a Semantic Refinement Module (SRM) is implemented in the context branch, where global semantic information is captured through adaptive pooling, and local and global features undergo parallel processing. In the detail branch, high-frequency detail features are guided and reinforced by a Context-Guided Detail Module (CGDM), which leverages semantic information and implements detail-enhanced convolution. Additionally, a Fourier-Domain Adaptive Fusion Module (FDAFM) is developed to achieve efficient fusion of contextual and detail features. This module extracts global frequency information through a Fourier transform, and dynamically fuses features from both branches via an adaptive gating mechanism, enabling effective integration of dual-branch features. CGDFNet achieves 77.8% mIoU with an inference speed of 87.6 FPS on the Cityscapes test set, while attaining 77.9% mIoU at 128.7 FPS on the CamVid test set. Experimental evaluations indicate that CGDFNet balances segmentation quality with real-time inference speed.