LST-BEV: Generating a Long-Term Spatial-Temporal Bird's-Eye-View Feature for Multi-View 3D Object Detection

LST-BEV:生成用于多视角3D目标检测的长期时空鸟瞰图特征

阅读:1

Abstract

This paper presents a novel multi-view 3D object detection framework, Long-Term Spatial-Temporal Bird's-Eye View (LST-BEV), designed to improve performance in autonomous driving. Traditional 3D detection relies on sensors like LiDAR, but visual perception using multi-camera systems is emerging as a more cost-effective solution. Existing methods struggle with capturing long-range dependencies and cross-task information due to limitations in attention mechanisms. To address this, we propose a Long-Range Cross-Task Detection Head (LRCH) to capture these dependencies and integrate cross-task information for accurate predictions. Additionally, we introduce the Long-Term Temporal Perception Module (LTPM), which efficiently extracts temporal features by combining Mamba and linear attention, overcoming challenges in temporal frame extraction. Experimental results in the nuScenes dataset demonstrate that our proposed LST-BEV outperforms its baseline (SA-BEVPool) by 2.1% mAP and 2.7% NDS, indicating a significant performance improvement.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。