Abstract
LiDAR point cloud semantic segmentation is pivotal for scan-to-BIM workflows; however, contemporary deep learning approaches remain constrained by their reliance on extensive annotated datasets, which are challenging to acquire in actual construction environments due to prohibitive labeling costs, structural occlusion, and sensor noise. This study proposes a BIM-guided Virtual-to-Real (V2R) framework that requires no real annotations. The method is trained entirely on a large synthetic point cloud (SPC) dataset consisting of 132 scans and approximately 8.75×109 points, generated directly from BIM models with component-level labels. A multi-feature fusion network combines the global contextual modeling of PCT with the local geometric encoding of PointNet++, producing robust representations across scales. A learnable point cloud augmentation module and multi-level domain adaptation strategies are incorporated to mitigate differences in noise, density, occlusion, and structural variation between synthetic and real scans. Experiments on real construction floors from high-rise residential buildings, together with the BIM-Net benchmark, show that the proposed method achieves 70.89% overall accuracy, 53.14% mean IoU, 69.67% mean accuracy, 54.75% FWIoU, and 59.66% Cohen's κ, consistently outperforming baseline models. The Fusion model achieves 73 of 80 best scene-metric results and 31 of 70 best component-level scores, demonstrating stable performance across the evaluated scenes and floors. These results confirm the effectiveness of BIM-generated SPC and indicate the potential of the V2R framework for BIM-reality updates and automated site monitoring within similar building contexts.