RGB-FIR Multimodal Pedestrian Detection with Cross-Modality Context Attentional Model

基于跨模态上下文注意力模型的RGB-FIR多模态行人检测

阅读:1

Abstract

Pedestrian detection is an important research topic in the field of visual cognition and autonomous driving systems. The proposal of the YOLO model has significantly improved the speed and accuracy of detection. To achieve full day detection performance, multimodal YOLO models based on RGB-FIR image pairs have become a research hotspot. Existing work has focused on the design of fusion modules after feature extraction of RGB and FIR branch backbone networks, achieving a multimodal backbone network framework based on back-end fusion. However, these methods overlook the complementarity and prior knowledge between modalities and scales in the front-end raw feature extraction of RGB and FIR branch backbone networks. As a result, the performance of the backend fusion framework largely depends on the representation ability of the raw features of each modality in the front-end. This paper proposes a novel RGB-FIR multimodal backbone network framework based on a cross-modality context attentional model (CCAM). Different from the existing works, a multi-level fusion framework is designed. At the front-end of the RGB-FIR parallel backbone network, the CCAM model is constructed for the raw features of each scale. The RGB-FIR feature fusion results of the lower-level features of the RGB and FIR branch backbone networks are fully utilized to optimize the spatial weight of the upper level RGB and FIR features, to achieve cross-modality and cross-scale complementarity between adjacent scale feature extraction modules. At the back-end of the RGB-FIR parallel network, a channel-space joint attention model (CBAM) and self-attention models are combined to obtain the final RGB-FIR fusion features at each scale for those RGB and FIR features optimized by CCAM. Compared with the current RGB-FIR multimodal YOLO model, comparative experiments on different performance evaluation indicators on multiple RGB-FIR public datasets indicate that this method can significantly enhance the accuracy and robustness of pedestrian detection.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。