Abstract
To address active voltage control in photovoltaic (PV)-integrated distribution networks characterized by weak voltage support conditions, this paper proposes a multi-agent deep reinforcement learning (MADRL)-based coordinated control method for PV clusters. First, the voltage control problem is formulated as a decentralized partially observable Markov decision process (Dec-POMDP), and a centralized training with decentralized execution (CTDE) framework is adopted, enabling each inverter to make independent decisions based solely on local measurements during the execution phase. To balance voltage compliance with energy efficiency, two barrier functions are designed to reshape the reward function, introducing an adaptive penalization mechanism: a steeper gradient in violation region to accelerate voltage recovery to the nominal range, and a gentler gradient in the safe region to minimize excessive reactive regulation and power losses. Furthermore, six representative MADRL algorithms-COMA, IDDPG, MADDPG, MAPPO, SQDDPG, and MATD3-are employed to solve the active voltage control problem of the distribution network. Case studies based on a modified IEEE 33-bus system demonstrate that the proposed framework ensures voltage compliance while effectively reducing network losses. The MADDPG algorithm achieves a Controllability Ratio (CR) of 91.9% while maintaining power loss at approximately 0.0695 p.u., demonstrating superior convergence and robustness. Comparisons with optimal power flow (OPF) and droop control methods confirm that the proposed approach significantly improves voltage stability and energy efficiency under model-free and communication-constrained weak grid conditions.