Abstract
Visible-infrared person re-identification (VI-ReID) plays a crucial role in cross-modal surveillance by matching individuals between visible and infrared imagery. Although progress has been made, existing methods struggle with several limitations: (1) modality heterogeneity hinders robust cross-spectral feature alignment; (2) attention mechanisms often lack adaptability to dynamic cross-modal contexts; (3) loss functions are combined heuristically without optimal weighting. To overcome these limitations, we introduce DASF-AKSA, a novel framework featuring Dynamic Adaptive Synergistic Fusion and Adaptive Kernel Selection Attention. The DASF module enables content-aware cross-modal fusion through learnable channel switching, while AKSA employs parallel 1D convolutions to capture multi-scale channel contexts with minimal computational overhead. We further design a Quadruple Balance-Optimized Loss framework with weights derived from gradient characteristic analysis, systematically balancing identification, center triplet, supervised contrastive, and margin-based MMD losses for stable multi-objective optimization. Experiments on SYSU-MM01, RegDB, and LLCM datasets show that our method consistently achieves better performance. On RegDB, in particular, DASF-AKSA network reaches 96.20% Rank-1 accuracy and 92.12% mAP, outperforming most existing approaches. Comprehensive validations confirm each component’s contribution through ablation studies, gradient visualization, Grad-CAM heatmaps under low-light conditions, and cross-architecture generalization tests.