Abstract
This paper studies deep reinforcement learning (DRL)-based joint resource allocation and three-dimensional (3D) trajectory optimization for unmanned aerial vehicle (UAV)-ground access point (GAP) cooperative non-orthogonal multiple access (NOMA) communication in Industrial Internet of Things (IIoT) systems. Cooperative and non-cooperative users adopt different signal transmission strategies to meet diverse, task-oriented, quality-of-service requirements. Specifically, the DRL framework based on the Soft Actor-Critic algorithm is proposed to jointly optimize user scheduling, power allocation, and UAV trajectory in continuous action spaces. Closed-form power allocation and maximum weight bipartite matching are integrated to enable efficient user pairing and resource management. Simulation results show that the proposed scheme significantly enhances system performance in terms of throughput, spectral efficiency, and interference management, while enabling robustness against channel uncertainties in dynamic IIoT environments. The findings indicate that combining model-free reinforcement learning with conventional optimization provides a viable solution for adaptive resource management in dynamic UAV-GAP cooperative communication scenarios.