Abstract
Object detection in aerial drone imagery has attracted increasing attention in Unmanned Aerial Vehicle(UAV) sensing applications. However, small objects occupying limited image regions, with large scale variations and similar background interference, make it challenging to perceive them. Meanwhile, the constrained computing power of the onboard platform imposes requirements on the speed and efficiency of the algorithm. In this paper, we propose an efficient object detection network for real-time UAV perception named ESO-Det. Our approach introduces three key innovations: (1) Dense Cross-branch Complementary Module, a lightweight model that dynamically integrates semantic and spatial information to improve the network's understanding of scene details. (2) Large-Kernel Context Integration Module, a module that expands receptive fields to effectively aggregate multi-scale contextual information. (3) Lightweight Selective Aggregation Module, a model selectively aggregates fused multi-scale features through different functional branches. Extensive experiments demonstrate that the proposed method achieves higher performance than representative existing approaches while maintaining real-time processing capability. The results show that our method is suitable for real-time UAV object detection.