Abstract
Accurately detecting and counting potatoes during early harvest is essential for estimating yield, automating sorting, and supporting data-driven agricultural decisions. However, field environments often present practical challenges-such as soil occlusion, overlapping tubers, and inconsistent lighting-that hinder robust visual recognition. In response, we introduce SCG-YOLOv8n, a compact and field-adapted detection framework built upon the YOLOv8n architecture and specifically tailored for small-object detection in real-world farming conditions. The model incorporates three practical enhancements: a C-SPD module that preserves spatial detail to improve recognition of partially buried tubers; an S-CARAFE operator that reconstructs fine-scale features during upsampling; and GhostShuffleConv layers that reduce computational overhead without sacrificing accuracy. Through extensive field-based experiments, SCG-YOLOv8n consistently outperforms YOLOv5n and its base version across all key metrics. Float16 quantization compresses the model to 3.2 MB, enabling real-time inference on Android devices. We also developed PotatoDetector, a mobile application that demonstrates stable performance in field trials, achieving an RMSE of 1.38 and [Formula: see text] of 0.96 in counting tasks. These results suggest that SCG-YOLOv8n offers a practical and scalable tool for precision agriculture, with potential applicability to other root and tuber crop monitoring scenarios.