Abstract
The present research introduces the Adaptive Trust Region Policy Optimization for Action Space Compression (ATRPO-ACS) framework, a novel deep reinforcement learning approach optimized through trust region strategies, designed to address adaptive control challenges in high-dimensional continuous action spaces. By integrating distributed KL constraint optimization and manifold projection with residual compensation, the framework achieves significant improvements in sampling efficiency and real-time performance while reducing trajectory tracking errors and voltage limit violations. Experimental validations demonstrate its superior performance, with robotic arm tracking errors maintained within ± 0.08 mm and microgrid scheduling costs reduced by 28.5%. The framework also notably shortens production cycles in automotive welding lines. These advancements provide robust theoretical and technical support for real-time optimization control in industrial intelligent systems.