Abstract
This paper proposes an optimized algorithm based on YOLOv11s to address the problem of insufficient detection accuracy of vehicle targets from a drone perspective due to certain scenes involving complex backgrounds, dense vehicle targets, and/or large variations in vehicle target scales due to oblique imaging. The proposed algorithm enhances the model's local feature extraction capability through a module collaboration optimization strategy, integrates coordinate convolution to strengthen spatial perception, and introduces a small object detection head to address target size variations caused by altitude changes. Additionally, we construct a dedicated dataset for urban vehicle detection that is characterized by high-resolution images, a large sample size, and low training resource requirements. Experimental results show that the proposed algorithm achieves gains of 1.9% in precision, 6.0% in recall, 4.2% in mAP@0.5, and 3.3% in mAP@0.5:0.95 compared to the baseline network. The improved model also achieves the highest F1-score, indicating an optimal balance between precision and recall.