Abstract
To address the challenges of strong dynamic coupling, action space dimension explosion, and voltage imbalance in reactive power and voltage scheduling of cross-regional power grids, this paper proposes a hierarchical coordinated scheduling method based on multi-agent reinforcement learning. The method first constructs a multi-agent reinforcement learning framework driven by probabilistic neural networks to perform distributed representation learning on the joint state vectors, achieving high-precision prediction of reactive power and voltage operating states for each node (prediction error MAE < 0.01 p.u.). Building upon the prediction results, a three-layer "prediction-decision-regulation" coordination mechanism is designed, integrating environmental state perception, action space optimization, and dynamic sensitivity analysis. This effectively addresses real-time decision-making challenges in high-dimensional action spaces, reducing average scheduling decision time by approximately 34.2%. Finally, sensitivity-driven feedback regulation achieves real-time balancing of reactive power and voltage at each node, guiding the power grid to converge stably to an optimal power flow state. Experimental results on the IEEE 33-node system demonstrate that the proposed method increases the voltage qualification rate to 98.7%, reduces system power loss by 30.5%, and decreases the maximum voltage magnitude deviation from 1.679 p.u. to 1.589 p.u., significantly outperforming traditional methods.