Abstract
With the rapid development of remote sensing technology, optical remote sensing images are increasingly being used in areas such as military reconnaissance, environmental monitoring, and urban planning. Due to the small number of pixels, fuzzy features, and complex background, it is difficult for conventional convolutions to effectively extract features from small objects. To address this problem, we propose to use multi-scale dilated convolutions to increase the receptive field size of the model to adapt to changes in object size, capture multi-scale contextual information of the feature map, and extract richer object features. First, we propose a Dilated Convolutional Residual (DCR) module for high-level feature extraction in the network. Second, the context aggregation (CONTEXT) module uses remote interaction to perform associative computation on images using contextual aggregation, allowing the model to understand the global semantic information of the image. We propose a novel object detection method, DCN-YOLO, which achieves an AP50 of 56.6 on the AI-TOD dataset, effectively improving the detection accuracy and robustness of small objects in remote sensing images. It provides a new technical approach to the detection of small objects in remote sensing.