Abstract
Object detection in remote sensing images is a major challenge because of the complex background and diverse appearance of remote sensing images. At the same time, targets may have different sizes and densely distributed targets in the image. To address these challenges, we propose an enhanced YOLOv8 model that integrates three key components: dynamic convolution (DyConv), which replaces the ordinary convolutional layer in the C2F module in the backbone network, and adaptively adjusts the convolution filter according to the input features to enhance the model's ability to deal with objects of different scales and appearance. Dual level Routing Attention (BRA) was used to process the high-level features of the backbone network, suppress irrelevant background information, emphasize effective features and establish semantic correlation. Asymptotic Feature Pyramid Network (AFPN) improves multi-scale feature fusion to ensure the effective fusion of small object details in the low level and high-level semantic information. Through experiments on the Remote Sensing Object Detection (RSOD) dataset, our improved YOLOv8 model shows significant performance improvement over the original model, achieving 65.4% of the mAP50-95 score, an improvement of 3.3%. Compared with the mainstream single-stage, two-stage and DETR models, our proposed model achieves the improvement of detection accuracy while ensuring computational efficiency.