Abstract
Unmanned aerial vehicles enable efficient ground-object recognition under adverse illumination, yet infrared imagery still suffers from low contrast, background clutter, and tiny targets. We present YOLO-IR based on YOLOv7, a lightweight detector that integrates a global-efficient backbone to strengthen global-local thermal-texture modeling, the parameter-free SimAM attention to highlight salient IR structures, an efficient BiFPN for weighted bidirectional multi-scale fusion, and the normalized wasserstein distance for scale-insensitive localization across assignment, regression, and non-maximum suppression. On a UAV thermal dataset, YOLO-IR attains 94.5% precision, 92.9% recall, and 95.7% mAP@0.5, improving over the YOLOv7 baseline by + 4.3% P, + 1.8% R, and + 4.2% mAP, while maintaining real-time throughput on a single GPU. Comprehensive ablations attribute consistent gains to each component, and qualitative results on dense, low-contrast scenes show fewer misses and false alarms. These findings indicate that YOLO-IR delivers accurate and efficient IR road-object recognition from UAV viewpoints.