Abstract
Robotic pollination represents a pivotal component of smart agriculture, with foundational architectures for target recognition, path planning, and motion control having been progressively established. However, developing an efficient and robust pollination system that integrates perception, decision-making, and execution within real-world scenarios remains confronted with complex challenges. This study systematically reviews recent advancements in the field and distills the core technical issues of greenhouse robotic pollination into three primary domains: target detection and pose estimation, end-effector design, and pollination strategies combined with motion control. Focusing on the visual perception of flowers, actuator architecture, and operational tactics, this review synthesizes existing academic findings to evaluate the state-of-the-art in flower detection and pose estimation, characterize diverse end-effector designs, and analyze the evolutionary trajectory of motion control techniques. Specifically, the analysis encompasses the impact of detection algorithms on recognition accuracy and robustness, the structural classification and performance attributes of pollination mechanisms, and the optimization of control strategies. Furthermore, the study categorizes global research backgrounds, technical methodologies, and paradigmatic system cases, offering a critical evaluation of experiences in constructing automated pollination systems. Despite these advances, current robotic pollination technologies for peppers (chili) face significant bottlenecks characterized by immature methods for precise flower detection and pose estimation, the need for optimized specialized end-effector designs, and insufficient robustness in decision-making systems under dynamic environmental conditions. To address these issues, future development should prioritize constructing diverse, large-scale flower image and pose datasets while developing detection algorithms adaptable to complex environments to achieve high-precision identification. Additionally, implementing this system requires a hierarchical architecture where perception drives adaptive actuation. Deep learning models must localize flower targets and assess maturity in real-time, feeding coordinates to path planners that generate collision-free trajectories through foliage. These trajectories are executed via multimodal motion control, synchronizing the rigid manipulator with soft end-effectors. By embedding tactile feedback into the machine learning loop, the system creates a unified sensorimotor framework. This enables dynamic force modulation based on physical resistance, ensuring precise, non-destructive pollination tailored to chili plants.