Abstract
Simultaneous online mapping and semantic segmentation using handheld scanners supports various environmental inspection and measurement tasks. For such scanners, combing visual and LiDAR data is beneficial for improving the segmentation performance. But the direct fusion of multi-modal and multi-view features faces challenges in terms of both real-time performance and robustness. To address these challenges, this paper proposes a multi-view and cross-modal knowledge distillation method for supporting runtime LiDAR-only semantic segmentation. The proposed method hierarchically compacts multi-view and cross-model priors and distills them into two branches to improve segmentation accuracy. In addition, we design an improved data augmentation technique based on PolarMix for rendering more realistic point cloud scenes. The experimental results on the SemanticKITTI and nuScenes datasets demonstrate that the mIoU of our approach outperforms the state-of-the-art knowledge-distillation-based methods. In addition, mapping experiments using a handheld scanner demonstrate the proposed method's superior real-time performance and accuracy.