Abstract
Real-time defect detection of high-speed railway catenary components remains challenging due to the prevalence of small-sized parts (e.g., cotter pins, fasteners) and the computational constraints of deployment platforms. While existing YOLO-based models offer a balance between speed and accuracy, they often struggle with small object detection and suffer from high computational costs. To address these limitations, this paper proposes an optimized YOLOv11m model, termed MSIM-YOLOv11m, which integrates three novel modules: large separable kernel attention (LSKA) for enhanced feature extraction, bidirectional feature pyramid network (BiFPN) for efficient multi-scale fusion, and adaptive kernel convolution (AKConv) for flexible feature learning. Experimental results on a dedicated catenary dataset show that the proposed model achieves a mAP50-95 of 78.3% and a small-target AP of 64.7%, while reducing computational cost by 50.5% compared to YOLOv9m. The model provides a lightweight and accurate solution suitable for real-time inspection applications.The code has been uploaded to https://github.com/1748125472/MSIM-Yolov11m/tree/master .