Deep deterministic policy gradient algorithm based on dung beetle optimization and priority experience replay mechanism

基于蜣螂优化和优先经验回放机制的深度确定性策略梯度算法

阅读:1

Abstract

Reinforcement learning algorithms that handle continuous action spaces have the problem of slow convergence and local optimality. Hence, we propose a deep deterministic policy gradient algorithm based on the dung beetle optimization algorithm (DBOP-DDPG) and priority experience replay mechanism. This method first adopts the simultaneous search policy of multiple populations by introducing the dung beetle optimizer (DBO), which can effectively keep the algorithm from falling into the local optimum solution and improve global optimization capability. Then, we design a criterion for determining the priority of sample data. The experience replay mechanism sampling is improved, and sample data in the experience replay mechanism are stored in three replay mechanisms based on importance for subsequent sampling training to then improve the algorithm's convergence speed. Finally, tests were conducted in three classic control environments of OpenAI Gym. The results showed that the improved method improved the convergence speed by at least 10% compared with the comparison algorithm, and the cumulative reward value was increased by up to 150.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。