Abstract
Small target detection in remote sensing images faces challenges due to complex backgrounds, weak features, and large scale differences. This paper proposes an improved YOLOv5-based network, termed ClearSight-RS, with the full name "Clear and Accurate Small-target Insight for Remote Sensing". As the name implies, the network is dedicated to achieving clear feature perception and accurate target localization for small targets in remote sensing images. The improvements focus on three aspects: integrating an improved Dynamic Snake Convolution (DSConv) module into the backbone network to strengthen the extraction of small target boundaries and geometric features, as well as the expression of weak textures; embedding a Bi-Level Routing Attention (BRA) module in the Neck part to enhance target focusing and suppress background interference; and optimizing the detection head by retaining only shallow high-resolution feature layers for prediction, reducing feature loss and redundant computations. Experimental results show that, based on the VEDAI dataset, ClearSight-RS achieves the highest mAP for all 8 vehicle categories; based on the NWPU VHR-10 dataset, its overall mAP reaches 93.8%, significantly outperforming algorithms such as Faster RCNN and YOLOv5l; based on the DOTA dataset, the capability of the proposed BRA module in suppressing background interference and capturing small target features is demonstrated. The network balances accuracy and efficiency, performing prominently in detecting vehicles and multi-category small targets in complex backgrounds, verifying its effectiveness.