Abstract
The utilization of unmanned aerial vehicle (UAV) in diverse scenarios, including disaster relief and delivery services, is experiencing a daily increase. In these applications, 3D path planning holds substantial research significance as it directly influences the operational efficiency, safety, and adaptability of the UAV. Nevertheless, the challenge of efficient 3D path planning for UAV in complex predefined environments persists due to the computational intractability of exact methods and the susceptibility of metaheuristics to local optima. While recent studies have focused on enhancing planners through multi-strategy fusion, they often rely on static heuristic rules and fixed parameter tuning. In this context, to address such problems more effectively, this paper presents a reinforcement learning-based hybrid algorithm integrating Probabilistic Roadmap (PRM) and Ant Colony Optimization (ACO), namely the PRM-QACO algorithm. Firstly, it employs the PRM method to generate a 3D random graph, thereby simplifying the 3D space and enhancing exploration efficiency. Secondly, it incorporates directional information into the ACO heuristic, enabling the UAV to reach the target more efficiently within the 3D space. Thirdly, and most distinctively, a Q-learning module is embedded as an intelligent controller to dynamically balance exploration and exploitation by rewarding or penalizing the ants' search outcomes, thus optimizing the paths discovered by elite ants. Finally, a path optimization mechanism is introduced to minimize the number of turns in the planned path, which is crucial for the UAV to conserve energy and circumvent obstacles. Simulation experiments conducted in MATLAB and AirSim environments across various 3D terrains demonstrate that PRM-QACO is an effective solution for 3D UAV path planning.