Abstract
The rapid advancement of artificial technology is giving rise to new forms of cyber threats like memorization-based APT attacks, which not only pose significant risks to critical infrastructure but also present serious challenges to conventional security architectures. As a crucial service information system in railway passenger stations, the Railway Passenger Service System (RPSS) is particularly vulnerable due to its widespread terminal distribution and large attack surface. This paper focuses on two key challenges within the RPSS Cloud Center's Double-Layer Dynamic Heterogeneous Redundancy (DDHR) architecture under such attacks: (i) the inability to accurately estimate redundant executor scheduling time, and (ii) the absence of an intelligent defense scheduling method capable of countering memorization-based attacks within a unified and quantifiable environment. To address these issues, we first establish the problem formulation of optimizing defender's payoff under incomplete information, which applies information entropy of DDHR redundant executors to reflect attacking and defending behaviors. Then a method of estimating attacking time is proposed in order to overcome the difficulty in determining scheduling time due to incomplete information. Finally, we introduce the PPO_HE approach-a Proximal Policy Optimization (PPO) algorithm enhanced with quantifiable information Entropy and Heterogeneity of DDHR redundant executors. Extensive experiments were conducted for evaluation in terms of the two entropy-related metrics: information entropy decay amount and information entropy decay rate. Results demonstrate that the PPO_EH approach achieves the highest efficiency per scheduling operation in countering attacks and provides the longest resistance time against memorization-based attacks under identical initial information entropy conditions.