Abstract
Accurate real-time detection of hawthorn by vision systems is a fundamental prerequisite for automated harvesting. This study addresses the challenges in hawthorn orchards-including target overlap, leaf occlusion, and environmental variations-which lead to compromised detection accuracy, high computational resource demands, and poor real-time performance in existing methods. To overcome these limitations, we propose YOLO-DCL (group shuffling convolution and coordinate attention integrated with a lightweight head based on YOLOv8n), a novel lightweight hawthorn detection model. The backbone network employs dynamic group shuffling convolution (DGCST) for efficient and effective feature extraction. Within the neck network, coordinate attention (CA) is integrated into the feature pyramid network (FPN), forming an enhanced multi-scale feature pyramid network (HSPFN); this integration further optimizes the C2f structure. The detection head is designed utilizing shared convolution and batch normalization to streamline computation. Additionally, the PIoUv2 (powerful intersection over union version 2) loss function is introduced to significantly reduce model complexity. Experimental validation demonstrates that YOLO-DCL achieves a precision of 91.6%, recall of 90.1%, and mean average precision (mAP) of 95.6%, while simultaneously reducing the model size to 2.46 MB with only 1.2 million parameters and 4.8 GFLOPs computational cost. To rigorously assess real-world applicability, we developed and deployed a detection system based on the PySide6 framework on an NVIDIA Jetson Xavier NX edge device. Field testing validated the model's robustness, high accuracy, and real-time performance, confirming its suitability for integration into harvesting robots operating in practical orchard environments.