Abstract
In autonomous driving, detecting small and occluded objects remains a substantial challenge due to the complexity of real-world environments. To address this, we propose RSO-YOLO, an enhanced model based on YOLOv12. First, the bidirectional feature pyramid network (BiFPN) and space-to-depth convolution (SPD-Conv) replace the original neck network. This design efficiently integrates multi-scale features while preserving fine-grained information during downsampling, thereby improving both computational efficiency and detection performance. Additionally, a detection head for the shallower feature layer P2 is incorporated, further boosting the model's capability to detect small objects. Second, we propose the feature enhancement and compensation module (FECM), which strengthens features in visible regions and compensates for missing semantic information in occluded areas. This module improves detection accuracy and robustness under occlusion. Finally, we propose a lightweight global cross-dimensional coordinate detection head (GCCHead), built upon the global cross-dimensional coordinate module (GCCM). By grouping and synergistically enhancing features, this module addresses the challenge of balancing computational efficiency with detection performance. Experimental results demonstrate that on the SODA10M, BDD100K, and FLIR ADAS datasets, RSO-YOLO achieves mAP@0.5 improvements of 8.0%, 10.7%, and 7.2%, respectively, compared to YOLOv12. Meanwhile, the number of parameters is reduced by 15.4%, and model complexity decreases by 20%. In summary, RSO-YOLO attains higher detection accuracy while reducing parameters and computational complexity, highlighting its strong potential for practical autonomous driving applications.