Channel-shuffled transformers for cross-modality person re-identification in video

用于视频中跨模态行人重识别的通道重排变换器

阅读:1

Abstract

Effective implementation of person re-identification (Re-ID) across different modalities (such as daylight vs night-vision) is crucial for Surveillance applications. Information from multiple frames is essential for effective re-identification, where visual components from individual frames become less reliable. While transformers can enhance the temporal information extraction, the large number of channels required for effective feature encoding introduces scaling challenges. This could lead to overfitting and instability during training. Therefore, we proposed a novel Channel-Shuffled Temporal Transformer (CSTT) for processing multi-frame sequences in conjunction with a ResNet backbone to form Hybrid Channel-Shuffled Transformer Net (HCSTNET). Replacing fully connected layers in standard multi-head attention with ShuffleNet-like structures is important for integration of transformer attention with a ResNet backbone. Applying ShuffleNet-like structures reduces overfitting through parameter reduction with channel-grouping, and further improves learned attention using channel-shuffling. According to our tests with the SYSU-MM01 dataset in comparison against simple averaging of multiple frames, only the temporal transformer with channel-shuffling achieved a measurable improvement over the baseline. We have also investigated the optimal partitioning of feature maps therein.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。