Abstract
To address the state estimation and detection problem in the presence of noisy sensor observations, probing costs, and communication noise, we in this paper propose a soft actor-critic (SAC) deep reinforcement learning (DRL) framework for dynamically scheduling sensors and sequentially probing the state of a stochastic system. Moreover, considering Byzantine attacks, we design a generative adversarial network (GAN)-based framework to identify the Byzantine sensors. The GAN-based Byzantine detector and SAC-DRL-based agent are developed to operate in coordination to detect the state of the system reliably and fast while incurring small sensing cost. To evaluate the proposed framework, we measure the performance in terms of detection accuracy, stopping time, and the total probing cost needed for detection. Via simulation results, we analyze the performances and demonstrate that soft actor-critic algorithms are flexible and effective in action selection in imperfectly known environments due to the maximum entropy strategy and they can achieve stable performance levels in challenging test cases (e.g., involving jamming attacks, imperfectly known noise power levels, and high sensing cost scenarios). We also provide comparisons between the performances of the proposed soft actor-critic and conventional actor-critic algorithms as well as fixed scheduling strategies. Finally, we analyze the impact of Byzantine attacks and identify the reliability and accuracy improvements achieved by the GAN-based approach when combined with the SAC-DRL-based decision-making agent.