Abstract
Object detection in Unmanned Aerial Vehicle (UAV) imagery has gained significant traction in applications such as railway inspection and waste management. While emerging end-to-end detectors like DEIM show promise, they often struggle with weak feature responses and spatial misalignment in aerial scenarios. To address these issues, this paper proposes SCA-DEIM, a context-aware real-time detection framework. Specifically, we introduce the Adaptive Spatial and Channel Synergistic Attention (ASCSA) module, which refines existing attention paradigms by transitioning from a static gating mechanism to an active signal amplifier. Unlike traditional designs that impose rigid bounds on feature responses, this improved architecture enhances feature extraction by dynamically boosting the saliency of faint small-target signals amidst complex backgrounds. Furthermore, drawing inspiration from infrared small object detection, we propose the Cross-Stage Partial Shifted Pinwheel Mixed Convolution (CSP-SPMConv). By synergizing asymmetric padding with a spatial shift mechanism, this module effectively aligns receptive fields and enforces cross-channel interaction, thereby resolving feature misalignment and scale fusion issues. Comprehensive experiments on the VisDrone2019 dataset demonstrate that, compared with the baseline model, SCA-DEIM achieves improvements of 1.8% in Average Precision (AP), 2.3% in AP for small objects (APs), and 2.0% in AP for large objects (APl), while maintaining a competitive inference speed. Notably, visualization results under different illumination conditions demonstrate the strong robustness of the model. In addition, further validation on both the UAVVaste and UAVDT datasets confirms that the proposed method effectively enhances the detection performance for small objects.