Abstract
To overcome the obstacles of maintaining covert transmissions in wireless networks employing collaborative wardens, we develop a reinforcement learning framework that jointly optimizes cooperative jamming strategies and relay selection mechanisms. The study focuses on a multi-relay-assisted two-hop network, where potential relays dynamically act as information relays or cooperative jammers to enhance covertness. A reinforcement learning-based relay selection scheme (RLRS) is employed to dynamically select optimal relays for signal forwarding and jamming; the framework simultaneously maximizes covert throughput and guarantees warden detection failure probability, subject to rigorous power budgets. Numerical simulations reveal that the developed reinforcement learning approach outperforms conventional random relay selection (RRS) across multiple performance metrics, achieving (i) higher peak covert transmission rates, (ii) lower outage probabilities, and (iii) superior adaptability to dynamic network parameters including relay density, power allocation variations, and additive white Gaussian noise (AWGN) fluctuations. These findings validate the effectiveness of reinforcement learning in optimizing relay and jammer selection for secure covert communications under colluding warden scenarios.