Abstract
The rapid expansion of the goose farming industry creates a growing need for real-time flock counting and individual-level behavior monitoring. To meet this challenge, this study proposes an improved YOLOv8-based model, termed DAEF-YOLO (DualConv-augmented C2f, ADown down-sampling, Efficient Channel Attention integrated into SPPF, and FocalerIoU regression loss), designed for simultaneous recognition of Sanhua goose individuals and their diverse behaviors. The model incorporates three targeted architectural improvements: (1) a C2f-Dual module that enhances multi-scale feature extraction and fusion, (2) ECA embedded in the SPPF module to refine channel interaction with minimal parameter cost and (3) an ADown down-sampling module that preserves cross-channel information continuity while reducing information loss. Additionally, the adoption of the FocalerIoU loss function enhances bounding-box regression accuracy in complex detection scenarios. Experimental results demonstrate that DAEF-YOLO surpasses YOLOv5s, YOLOv7-Tiny, YOLOv7, YOLOv9s, and YOLOv10s in both accuracy and computational efficiency. Compared with YOLOv8s, DAEF-YOLO achieved a 4.56% increase in precision, 6.37% in recall, 5.50% in F1-score, and 4.59% in mAP@0.5, reaching 94.65%, 92.17%, 93.39%, and 96.10%, respectively. A generalizable classification strategy is further introduced by adding a complementary "Other" category to include behaviors beyond predefined classes. This approach ensures complete recognition coverage and demonstrates strong transferability for multi-task detection across species and environments. Ablation studies indicated that mAP@0.5 remained consistent (~96.1%), while mAP@0.5:0.95 improved in the absence of the "Other" class (75.68% vs. 69.82%). Despite this trade-off, incorporating the "Other" category ensures annotation completeness and more robust multi-task behavior recognition under real-world variability.