Abstract
Due to complex backgrounds, significant scale variations of targets, and dense distributions of small objects in Unmanned Aerial Vehicle (UAV) aerial images, traditional object detection algorithms face challenges in adapting to such scenarios. This article introduces a drone detection model, MEP-YOLOv5s, which optimizes the Backbone, Neck layer, and C3 module based on YOLOv5s, and combines effective attention mechanisms to improve the training efficiency of the model by replacing the traditional CIoU loss (Complete Intersection over Union) with MPDIoU (Minimum Point Distance-based Intersection over Union) loss. This model demonstrates an excellent performance in handling typical drone detection scenarios, especially for small and dense objects. To holistically balance the detection accuracy and inference efficiency, we propose a Comprehensive Performance Indicator (CPI), which evaluates the model performance by considering both accuracy and efficiency. Evaluations on the VisDrone2019 dataset demonstrate that MEP-YOLOv5s achieves a 3.3% improvement in precision (P), a 20.9% increase in mAP@0.5, and a 19.86% gain in the CPI (α = 0.5) compared with the baseline model. Additional experiments on the NWPU VHR-10 dataset confirm that MEP-YOLOv5s outperforms the existing state-of-the-art methods, offering a robust solution for UAV-based small object detection with enhanced feature extraction and attention-driven adaptability.