Abstract
In response to problems such as large target scale variations, strong background noise, and blurred features leading by low contrast in infrared target detection in near space environments, this paper proposes an efficient detection model, YOLO-MARS, which is based on YOLOv8. The model introduces a Space-to-Depth (SPD) convolution module into the backbone section, which retains the detailed features of smaller targets by downsampling operations without information loss, alleviating the loss of the target feature caused by traditional downsampling. The Grouped Multi-Head Self-Attention (GMHSA) module is added after the backbone's SPPF module to improve cross-scale global modeling capabilities for target area feature responses while suppressing complex thermal noise background interference. In addition, a Light Adaptive Spatial Feature Fusion (LASFF) detector head is designed to mitigate the scale sensitivity issue of infrared targets (especially smaller targets) in the feature pyramid. It uses a shared weighting mechanism to achieve adaptive fusion of multi-scale features, reducing computational complexity while improving target localization and classification accuracy. To address the extreme scarcity of near space data, we integrated 284 near space images with the HIT-UAV dataset through physical equivalence analysis (atmospheric transmittance, contrast, and signal-to-noise ratio) to construct the NS-HIT dataset. The experimental results show that mAP@0.5 increases by 5.4% and the number of parameters only increase 10% using YOLO-MARS compared to YOLOv8. YOLO-MARS improves the accuracy of detection significantly while considering the requirements of model complexity, which provides an efficient and reliable solution for applications in near space infrared target detection.