Abstract
To address the issues of low accuracy and high rates of false detection and missed detection in existing methods for pavement crack identification under complex road conditions, this paper proposes a novel approach named YOLO11-MBC, based on the YOLO11 model. A Multi-scale Feature Fusion Backbone Network (MFFBN) is designed to enhance the model's capability to recognize and extract crack features in complex environments. Considering that pavement cracks often exhibit elongated topologies and are susceptible to interference from similar features like tree roots or lane markings, we combine the Bidirectional Feature Pyramid Network (BiFPN) with a Multimodal Cross-Attention (MCA) mechanism, constructing a novel BiMCNet to replace the Concat layer in the original network, thereby optimizing the detection of minute cracks. The CGeoCIoU loss function replaces the original CIoU, employing three distinct penalty terms to better reflect the alignment between predicted and ground-truth boxes. The effectiveness of the proposed method is validated through comparative and ablation experiments on the public RDD2022 dataset. Results demonstrate the following: (1) Compared to the baseline YOLO11, YOLO11-MBC achieves a 22.5% improvement in F1-score and an 8% increase in mAP50 by integrating the three proposed modules, significantly enhancing performance for complex pavement crack detection. (2) The improved algorithm demonstrates superior performance. Compared to YOLOv8, YOLOv10, and YOLO11, it achieves precision, recall, F1-score, mAP50, and mAP50-95 of 61%, 70%, 72%, 75%, and 66%, respectively, validating the correctness of our approach.