TCCDNet: A Multimodal Pedestrian Detection Network Integrating Cross-Modal Complementarity with Deep Feature Fusion

TCCDNet:一种融合跨模态互补和深度特征融合的多模态行人检测网络

阅读:1

Abstract

Multimodal pedestrian detection has garnered significant attention due to its potential applications in complex scenarios. The complementarity characteristics between infrared and visible modalities can enhance detection performance. However, the design of cross-modal fusion mechanisms and the in-depth exploration of inter-modal complementarity still pose challenges. To address this, we propose TCCDNet, a novel network integrating cross-modal complementarity. Specifically, the efficient multi-scale attention C2f (EMAC) is designed for the backbone, which combines the C2f structure with an efficient multi-scale attention mechanism to achieve feature weighting and fusion, thereby enhancing the model's feature extraction capacity. Subsequently, the cross-modal complementarity (CMC) module is proposed, which enhances feature discriminability and object localization accuracy through a synergistic mechanism combining channel attention and spatial attention. Additionally, a deep semantic fusion module (DSFM) based on a cross-attention mechanism is incorporated to achieve deep semantic feature fusion. The experimental results demonstrate that TCCDNet achieves a MR(-2) of 7.87% on the KAIST dataset, representing a 3.83% reduction compared to YOLOv8. For the other two multimodal pedestrian detection datasets, TCCDNet attains mAP(50) scores of 83.8% for FLIR ADAS and 97.3% for LLVIP, outperforming the baseline by 3.6% and 1.9% respectively. These results fully validate the effectiveness and advancement of the proposed method.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。