Bi-Att3DDet: Attention-Based Bi-Directional Fusion for Multi-Modal 3D Object Detection

Bi-Att3DDet:基于注意力机制的双向融合多模态3D目标检测

阅读:1

Abstract

Currently, multi-modal 3D object detection methods have become a key area of research in the field of autonomous driving. Fusion is an essential factor affecting performance in multi-modal object detection. However, previous methods still suffer from the inability to effectively fuse features from LiDAR and RGB images, resulting in a low utilization rate of complementary information between depth and semantic texture features. At the same time, existing methods may not adequately capture the structural information in Region of Interest (RoI) features when extracting them. Structural information plays a crucial role in RoI features. It encompasses the position, size, and orientation of objects, as well as the relative positions and spatial relationships between objects. Its absence can result in false or missed detections. To solve the above problems, we propose a multi-modal sensor fusion network, Bi-Att3DDet, which mainly consists of a Self-Attentive RoI Feature Extraction module (SARoIFE) and a Feature Bidirectional Interactive Fusion module (FBIF). Specifically, SARoIFE captures the relationship between different positions in RoI features to obtain high-quality RoI features through the self-attention mechanism. SARoIFE prepares for the fusion stage. FBIF performs bidirectional interaction between LiDAR and pseudo RoI features to make full use of the complementary information. We perform comprehensive experiments on the KITTI dataset, and our method notably demonstrates a 1.55% improvement in the hard difficulty level and a 0.19% improvement in the mean Average Precision (mAP) metric on the test dataset.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。