Abstract
To address the issues of false detections and missed detections in object detection for intelligent driving scenarios, this study focuses on optimizing the YOLOv10 algorithm to reduce model complexity while enhancing detection accuracy. The method involves three key improvements. First, it involves the design of multi-scale flexible convolution (MSFC), which can capture multi-scale information simultaneously, thereby reducing network stacking and computational load. Second, it reconstructs the neck network structure by incorporating Shallow Auxiliary Fusion (SAF) and Advanced Auxiliary Fusion (AAF), enabling better capture of multi-scale features of objects. Third, it improves the detection head through the combination of multi-scale convolution and channel adaptive attention mechanism, enhancing the diversity and accuracy of feature extraction. Results show that the improved YOLOv10 model has a size of 13.4 MB, meaning a reduction of 11.8%, and that the detection accuracy mAP@0.5 reaches 93.0%, outperforming mainstream models in comprehensive performance. This work provides a detection framework for intelligent driving scenarios, balancing accuracy and model size.