Abstract
When picking red pepper clusters, occlusion and overlapping fruit are major challenges hindering the development of intelligent red pepper cluster picking robots. To achieve accurate and efficient picking, we proposed a lightweight red pepper cluster recognition model: Red-YOLO. To improve the efficiency of the robotic arm during picking, we constructed a custom dataset containing both diffuse and clustered red pepper clusters. During picking, we deploy this model to select different types of pepper clusters. After completing a cluster of one type, we can adjust the size of the end effector to pick a different type of red pepper cluster, thereby improving picking efficiency. This study uses prior processing based on the YOLO series of models and, based on the final results, selected YOLOv8n as the foundational module. A CBAM attention mechanism is integrated into the backbone network to enhance the model's focus on red pepper cluster features via MLP-based adaptive channel weighting, and the original upsampling operator is replaced with the CARAFE module to improve the utilization of spatial details and contextual information in densely overlapping clusters through content-adaptive feature reassembly. In addition, a lightweight structural design is implemented by incorporating GSConv and VoV-GSCSP modules. The improved Red-YOLO achieved improvements of 1.4%, 6.1%, and 3.2% in P, R, and mAP50, respectively, compared to the baseline model. The number of model parameters decreased by 1%, and GFlops decreased by 5%. Experimental results demonstrate that Red-YOLO offers advantages in real-time red pepper cluster detection, including fast detection speed and high accuracy. This technology provides technical support for identifying red pepper clusters on low-computing devices, such as mobile and embedded systems.