Abstract
Detecting small targets in remote sensing images is challenging for traditional lightweight methods due to the inherent conflict between feature representation capability and computational constraints. To address this, this paper proposes a lightweight and high-precision detection network, LMW-YOLO, built upon the YOLO11n baseline. We design a novel CSD strategy, which tailors the feature extraction process to the distinct requirements of different network stages. Guided by this strategy, we first design the LKCA module for the shallow P3 branch. This module decomposed LKA to capture long-range dependencies and global context essential for small targets, effectively compensating for the limited receptive fields of standard convolutions. Subsequently, to handle semantic ambiguity in deeper layers, the MSDP module is introduced in the P4 branch, which expands the receptive field to capture multi-scale semantic context without sacrificing spatial resolution. Furthermore, the WIoU v3 loss function is incorporated to optimize bounding box regression. By employing a dynamic non-monotonic focusing mechanism, WIoU v3 intelligently rebalances gradient gains based on anchor box quality, which accelerates convergence and enhances localization accuracy in dense scenarios. Experimental results on the NWPU VHR-10, RS-STOD, and VisDrone2019 datasets demonstrate that LMW-YOLO achieves superior detection performance compared to state-of-the-art methods while maintaining an extremely low parameter count (2.6 M), validating its effectiveness for resource-constrained aerial applications.