Abstract
Unmanned aerial vehicles (UAVs) used as aerial base stations (ABS) can provide economical, on-demand wireless access. This research investigates dynamic resource allocation in multi-UAV-enabled communication systems with the aim of maximizing long-term rewards. More specifically, without exchanging information with other UAVs, every UAV chooses its communicating users, power levels, and sub-channels to establish communication with a ground user. In the proposed work, the dynamic scheme-based resource allocation is investigated of communication networks made possible by many UAVs to achieve the highest possible performance level over time. Specifically, each UAV selects its connected users, battery power, and communication channel independently, without exchanging information across multiple UAVs. This allows each UAV to connect with ground users. To model the unpredictability of the environment, we present the problem of long-term allocation of system resources as a stochastic game to maximize the anticipated reward. Each UAV in this game plays the role of a learnable agent, and the system solution for resource allocation matches the actions made by the UAV. Afterward, we built a framework called reward-based multi-agent learning (RMAL), in which each agent uses learning to identify its best strategies based on local observations. RMAL is an acronym for ″reward-based multi-agent learning″. We specifically offer an agent-independent strategy where each agent decides algorithms separately but cooperates on a common Q-learning-based framework. The performance of the suggested RMAL-based resource allocation method may be enhanced by employing the right development and exploration parameters, according to the simulation findings. Secondly, the proposed RMAL algorithm provides acceptable performance over full information exchange between UAVs. Doing so achieves a satisfactory compromise between the increase in performance and the additional burden of information transmission.