Abstract
Cross-modality person re-identification (Re-ID) aims to match person infrared and RGB images across non-overlapping camera views. Most of existing methods rely on images of the same person under both infrared and RGB modalities, such cross-modality pairs are difficult to collect since pedestrians appear in a short time span. Matching across infrared and RGB images is difficult under the challenge of complete modality missing, where the training data contains no cross-modality image pairs, due to the large appearance difference. In this work, we study intra-modality supervised person re-identification under complete modality missing, which uses cross-modality unpaired data with intra-modality identity labels for training. It is challenging as cross-modality paired data plays an important role for learning modality-invariant representation in most existing Re-ID methods. To learn modality-invariant representation from cross-modality unpaired training data, we first develop a strong baseline with a dual-head cross-entropy loss and a multi-modality negative loss, aiming to alleviate cross-modality contrast and enhance intra-modality contrast. Then, we propose a residual modality alleviation network and a shape-guided consistency learning loss to further alleviate cross-modality representation discrepancy. The experiments are conducted in the complete modality missing setting on SYSU-MM01-CMM and RegDB-CMM datasets. The evaluation results demonstrate the superiority of our method.