Abstract
Traditional DRL-based resource allocation for cloud-edge-end computing primarily depends on known state parameters and real-time feedback rewards when making decisions. The traditional model, which heavily relies on prior knowledge and real-time feedback of the scene, faces challenges in delivering effective services in complex scenarios. We propose a DT-aided Expert-driven Generative Adversarial Imitation Learning (E-GAIL) model that leverages imitation learning capability to jointly allocate multiple constrained resources. Firstly, we introduce a single-expert trajectory generation algorithm based on Actor-Critic and Noisynet by using the rich historical data provided in DT Networks. This idea can enhance the fidelity of the imitated expert trajectory by utilizing the critic to update the network iteratively. Secondly, we fuse different single-expert trajectories into a multi-expert trajectory to expand the coverage area. We also employ the Nash equilibrium to identify the optimal equilibrium solution and reduce the conflicts among different experts. Finally, the parameters of the generator and discriminator in E-GAIL are updated according to the respective gradients to fit the multi-expert trajectory during the training process. Once the task is uploaded, the E-GAIL Agent in the edge server can rapidly obtain the resource allocation policy even without prior knowledge or real-time reward feedback. The experiment results indicate that E-GAIL can obtain the best-fit expert trajectory in large-scale noisy environments.