Abstract
Having comprehensive access to information on sheep behaviors is essential for acquiring relevant insights into the health status of sheep and preventing diseases promptly. In particular, the ruminating behavior of sheep tends to reflect the health of their digestive system. However, due to the challenging nature of detecting ruminating behavior, traditional research has not yielded satisfactory results. In fact, many computer vision-based methods do not consider ruminating behavior as a distinctive feature for recognizing sheep behavior. To this end, we proposed an efficient model for recognizing active (standing, feeding, drinking, suckling), inactive (kneeling), and ruminating behaviors of sheep flock based on You Only Look Once-v5 (YOLOv5). This model improved the original structure through introducing the latest fully convolutional neural network in the backbone network, thereby reducing parameters cost and improving the classification accuracy. At the same time, the Spatial Pyramid Pooling Cross Stage Partial Connection module from You Only Look Once-v7 was added at the end of the backbone network, which improved the ability of the model to recognize multiscale sheep behaviors by increasing the receptive field. In the neck network part, a new algorithm named Concat optimization algorithm based on channel concatenation (CiConcat) was proposed to optimize the regular Concat operation, which enhanced the filtering ability of the small target detection layer of the model for nonruminating behavior information, thereby improving the ability to accurately extract local details. In addition, Selective Kernel attention mechanism was introduced before the detection head to enhance the feature extraction and expression ability of sheep. Finally, a detection head matching strategy was proposed to promote the accuracy when detecting distant ruminating behavior of sheep. This study used a self-collected and annotated outdoor sheep farm image dataset for experimental validation. The results showed that the improved network achieved the mean Average Precision of 94.1% on outdoor grazing sheep flock images which was more accurate and faster than state-of-the-arts. Simultaneously, the Average Precision for ruminating detection reached 80.2%, showing an improvement of 35.44% compared to YOLOv5, markedly enhancing the accuracy of ruminating behavior recognition. Our model was proved to be able to continuously monitor the behavior frequency of sheep, providing a robust technical foundation for analyzing behavior patterns and understanding the physiological and behavioral needs of sheep.