MSCNet: Efficient and accurate semantic segmentation of LiDAR data using Multi-scale Convolution

MSCNet:基于多尺度卷积的高效、精确的激光雷达数据语义分割

阅读:1

Abstract

In autonomous driving and intelligent robotics, the semantic information of LiDAR (Light Detection and Ranging) sensor data is crucial for understanding the surrounding environment. However, directly operating on point clouds is computationally expensive. To address this, some researchers have projected three-dimensional LiDAR data onto a two-dimensional spherical range view and used two-dimensional convolutional neural networks to segment the projected images. While the results are promising, many of these models are structurally complex, with high spatiotemporal complexity, which makes them unsuitable for real-time applications. To solve these issues, this paper proposes a multi-scale LiDAR data semantic segmentation method, MSCNet, with fewer parameters and higher segmentation accuracy. In the encoding phase, a single-channel multi-scale feature fusion block is introduced to alleviate the distribution differences between input channels. To obtain more stable local features, multi-scale dilated convolution residual blocks are designed to encode information from different receptive fields. To quickly capture global features, a pyramid pooling module is introduced. Experimental results on the SemanticKITTI, SemanticPOSS, and Pandaset datasets show that MSCNet achieves a good balance between parameter, accuracy, and running time. Particularly on the SemanticPOSS and Pandaset datasets, MSCNet achieves the best performance. Under the same parameter conditions, this method outperforms existing point cloud-based and projection-based methods.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。