Abstract
The object detection network based on YOLO has been widely used in the fields of intelligent transportation and public safety. Compared with visible light target detection, infrared target detection can work normally even in low light or harsh environments. In the visible light scene, YOLOv7-tiny has the advantages of speed and accuracy. However, when YOLOv7-tiny is directly applied to the infrared scene, the model still has some shortcomings, such as weak ability to extract detailed features, serious loss of semantic information, and more computational resources. So, a lightweight infrared target detection network called LIWL-YOLO, which is suitable for detecting both water and land targets, is proposed in this paper. Firstly, the lightweight backbone called SPFNet is designed by integrating space-to-depth convolution (SPDConv) into the FasterNet, so as to improve the feature extraction ability and speed of YOLOv7-tiny for low resolution images. Secondly, the attention module called SAF-CA is designed and added to the neck layer to make the model pay more attention to the weak texture features in the image. Furthermore, in order to improve the extraction ability of the model for low contrast information in images, the exponential space pyramid pool module is designed to replace the SPPCSPC module in YOLOv7-tiny. Finally, the knowledge distillation method of MGD is used to compress the knowledge into the improved model with YOLOv7 as the teacher model, so as to further improve the accuracy of the model for infrared targets. This paper constructs a hybrid dataset named FLIR-WSL as the experimental dataset, which combines the FLIR-v2 dataset and infrared water surface target images collected by our team. The experimental results on FLIR-WSL mixed data sets show that the map value of LIWL-YOLO is 69%, which is 4.3% higher than that of YOLOv7-tiny, and the FPS value on RTX4060 graphics card is 93. LIWL-YOLO not only takes into account the detection ability of land and water targets in infrared scene, but also realizes the balance between accuracy and speed.