CrossModalSync: joint temporal-spatial fusion for semantic scene segmentation in large-scale scenes

CrossModalSync:用于大规模场景语义场景分割的联合时空融合

阅读:1

Abstract

Owing to its ability to enable precise perception of dynamic and complex environments, point cloud semantic segmentation has become a critical task for autonomously driven vehicles in recent years. However, in complex, dynamic scenes, cumulative errors and the "many-to-one" mapping problem are challenges for existing semantic segmentation methods, which further limit their accuracy and efficiency. To address these, this paper introduces a new framework that balances accuracy and computational efficiency by utilizing temporal alignment (TA), projection multi-scale convolution (PMC), and priority point retention (PPR). By combining TA and PMC, the framework effectively captures inter-frame correlations, improving local detail information, reducing error accumulation, and maintaining detailed scene features. Second, employing the PPR mechanism ensures that critical three-dimensional information is retained, thereby resolving information loss caused by the "many-to-one" mapping problem. Finally, by combining LiDAR and camera data through multimodal fusion, the framework provides complementary perspectives, further enhancing segmentation performance. Our method achieves state-of-the-art performance on the benchmark SemanticKITTI and nuScenes datasets. Notably, the proposed framework excels at detecting occluded objects and dynamic entities.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。