Abstract
This study presents Reinforcement Operator Learning (ROL)-a hybrid control paradigm that marries Deep Operator Networks (DeepONet) for offline acquisition of a generalized control law with a Twin-Delayed Deep Deterministic Policy Gradient (TD3) residual for online adaptation. The framework is assessed on the one-dimensional Kuramoto-Sivashinsky equation, a benchmark for spatio-temporal chaos. Starting from an uncontrolled energy of 42.8, ROL drives the system to a steady-state energy of 0.40 ± 0.14, achieving a 99.1% reduction relative to a linear-quadratic regulator (LQR) and a 64.3% reduction compared with a pure TD3 agent. DeepONet attains a training loss of 7.8 × 10-6 after only 200 epochs, enabling the RL phase to reach its reward plateau 2.5 × sooner and with 65% lower variance than the baseline. Spatio-temporal analysis confirms that ROL restricts state amplitudes to [Formula: see text]-three-fold tighter than pure TD3 and an order of magnitude below LQR-while halving the energy in 0.19 simulation units (33% faster than pure TD3). These results demonstrate that combining operator learning with residual policy optimisation delivers state-of-the-art, sample-efficient stabilisation of chaotic partial differential equations and offers a scalable template for turbulence suppression, combustion control, and other high-dimensional nonlinear systems.