Abstract
Object detection in remote sensing imagery presents significant challenges due to complex backgrounds, the prevalence of small objects, and high instance density, all of which hinder both detection accuracy and computational efficiency. To address these issues, we propose an enhanced version of the YOLOv9 architecture specifically designed for remote sensing image analysis. Our model incorporates several key innovations: a multi-scale feature integration module (C3) that jointly captures fine-grained details and high-level semantic information; a channel attention mechanism (Squeeze-and-Excitation module) that adaptively highlights informative features while suppressing irrelevant background regions; an additional detection head (P2) aimed at improving small object recognition; and the Generalized Intersection over Union (GIoU) loss for more accurate bounding box regression and faster training convergence. Extensive experiments on the SIMD dataset demonstrate that our model achieves state-of-the-art performance, with 86.6% mAP@0.5 and 71.5% mAP@0.5-0.95, while operating at 84.0 FPS-significantly outperforming the baseline YOLOv9. Moreover, the model reduces the number of parameters by 21.2%, highlighting its efficiency. These advancements position our model as a highly effective solution for real-world remote sensing applications such as environmental monitoring, urban planning, and military reconnaissance.