Abstract
Accurate and efficient fruit detection is essential for precision agriculture, particularly in densely occluded crops such as kiwifruit. This study presents a comprehensive benchmarking and optimization framework covering YOLOv8–YOLOv11 architectures for kiwi detection, evaluated under both high-performance training conditions and embedded deployment on an NVIDIA Jetson TX2. A field-collected dataset containing 2,925 training and 1,936 test annotations was used to train five sub-models (n, s, m, l, x) per YOLO version under identical settings. To enhance efficiency for edge deployment, a structured hyperparameter optimization procedure was applied to all “s” models, yielding substantial performance gains without additional architectural modifications. Among all evaluated models, the optimized YOLOv11s achieved the best accuracy–efficiency trade-off, reaching mAP@0.5 = 0.956, precision = 0.868, recall = 0.918, and an embedded inference time of 3.33 s/image on Jetson TX2. While larger models (e.g., YOLOv8x, YOLOv11l) attained slightly higher raw accuracies (up to mAP@0.5 = 0.957), their latency rendered them unsuitable for edge deployment. The results demonstrate that lightweight YOLO architectures, when supported by targeted hyperparameter tuning, can be effectively adapted for resource-constrained agricultural systems. The proposed evaluation and optimization pipeline provides a transferable methodology for other fruit-detection tasks and supports future development of embedded vision solutions in precision agriculture.