Abstract
We present a data-driven deep reinforcement learning (DRL) method for the optimization of a hierarchically structured control policy that includes the central pattern generator. This method, which is as a whole referred to as the hierarchical reinforcement learning with the central pattern generator (HRL-CPG), is then evaluated with the expectation of its applicability in real robot controls. We observed that stable gait motions were gained in a reasonably small number of trials and errors. Thus, it can be deduced that our HRL-CPG can be a candidate DRL method that enables dynamical systems such as real or realistic robots to adapt to a variety of environments within a moderate physical time.