Abstract
Accurate scene understanding is crucial for the safe and stable operation of underground utility tunnel inspections. Addressing the characteristics of low-light environments, this paper proposes an object recognition method based on low-light enhanced image semantic segmentation. Secondly, by analyzing image data from real underground utility tunnel environments, the visual language model undergoes scene image fine-tuning to generate scene description text. Thirdly, integrating these functionalities into the system enables real-time processing of captured images and generation of scene understanding results. In practical applications, the average accuracy of the improved recognition model increased by nearly 1% compared to the original model, while the accuracy and recall of the fine-tuned visual-language model surpassed the untuned model by over 70%.