Abstract
Addressing key challenges in unstructured environments, including local optimum traps, limited real-time interaction, and convergence difficulties, this research pioneers a hybrid reinforcement learning approach that combines simulated annealing (SA) with proximal policy optimization (PPO) for robotic arm trajectory planning. The framework enables the accurate, collision-free grasping of randomly appearing objects in dynamic obstacles through three key innovations: a probabilistically enhanced simulation environment with a 20% obstacle generation rate; an optimized state-action space featuring 12-dimensional environment coding and 6-DoF joint control; and an SA-PPO algorithm that dynamically adjusts the learning rate to balance exploration and convergence. Experimental results show a 6.52% increase in success rate (98% vs. 92%) and a 7.14% reduction in steps per set compared to the baseline PPO. A real deployment on the AUBO-i5 robotic arm enables real machine grasping, validating a robust transfer from simulation to reality. This work establishes a new paradigm for adaptive robot manipulation in industrial scenarios requiring a real-time response to environmental uncertainty.