Abstract
Unsupervised visible-infrared person re-identification (USL-VI-ReID) aims to match person images across visible and infrared modalities without any labeled data, which is severely hindered by the large cross-modality discrepancy and the absence of ground-truth annotations. Recent advances predominantly adopt unsupervised contrastive learning frameworks that rely on clustering-generated pseudo-labels to guide representation learning. While existing methods emphasize establishing cross-modality correspondences for modality-invariant feature learning, they often overlook the adverse impact of unreliable pseudo-labels, which frequently arise from significant intra-class variations and inter-modality misalignment. Such noisy correspondences can severely degrade model robustness and generalization. To tackle this challenge, we propose Soft smooth Contrastive Learning with Hybrid Memory (SCLHM), a novel framework that jointly addresses noise pseudo-labels and cross-modality divergence. Specifically, we first design a Soft Smooth Contrastive Learning (SSCL) module that mitigates the influence of noise pseudo-labels by smoothing similarity distributions based on intra-class consistency. In addition, we introduce a Hybrid Memory Learning (HML) module that unifies modality-specific and modality-invariant feature representations, enabling more comprehensive knowledge integration. Furthermore, an Adaptive-weight Memory Update (AMU) strategy is developed to dynamically adjust memory bank updates during batch training, promoting the learning of globally discriminative and stable features. Code will be released.