Abstract
Environmental perception is crucial for achieving autonomous driving of auxiliary haulage vehicles in underground coal mines. The complex underground environment and working conditions, such as dust pollution, uneven lighting, and sensor data abnormalities, pose challenges to multimodal fusion perception. These challenges include: (1) the lack of a reasonable and effective method for evaluating the reliability of different modality data; (2) the absence of in-depth fusion methods for different modality data that can handle sensor failures; and (3) the lack of a multimodal dataset for underground coal mines to support model training. To address these issues, this paper proposes a coal mine underground BEV multiscale-enhanced fusion perception model based on dynamic weight adjustment. First, camera and LiDAR modality data are uniformly mapped into BEV space to achieve multimodal feature alignment. Then, a Mixture of Experts-Fuzzy Logic Inference Module (MoE-FLIM) is designed to infer weights for different modality data based on BEV feature dimensions. Next, a Pyramid Multiscale Feature Enhancement and Fusion Module (PMS-FFEM) is introduced to ensure the model's perception performance in the event of sensor data abnormalities. Lastly, a multimodal dataset for underground coal mines is constructed to provide support for model training and testing in real-world scenarios. Experimental results show that the proposed method demonstrates good accuracy and stability in object-detection tasks in coal mine underground environments, maintaining high detection performance, especially in typical complex scenes such as low light and dust fog.