Abstract
The rapid evolution of drone technology has expanded its applications across collaborative control, public safety, and aerial imaging, yet reliable object detection remains a challenge due to small target sizes and complex backgrounds in drone-captured imagery. To address these limitations, this paper introduces MFA-YOLO, a high-precision network specifically optimized for small-object detection in drone imagery. The proposed approach integrates three innovations: the Local Feature Mapping (LFM) unit for enhanced fine-grained feature extraction, the Progressive Shared Atrous Pyramid (PSAP) for efficient multi-scale feature integration, and the Dynamic Decoupling Head (DDH) for improved adaptive task alignment. Through these components, MFA-YOLO enhances representational capacity while preserving real-time inference efficiency. Experimental evaluations on the VisDrone benchmark demonstrate a 3.6% increase in AP(50), a 2.4% increase in AP, and a 17% reduction in model parameters compared to YOLOv8n. Additional experiments on UAVDT further indicate the model's promising generalization across similar drone datasets. These results highlight MFA-YOLO's potential to advance drone-based perception systems, making them more effective and efficient for safety-critical and real-time applications in resource-constrained UAV environments, such as public safety monitoring, surveillance, and autonomous aerial operations.