Abstract
It has long been recognised that variational data assimilation, including four-dimensional variational methods (4D-Var), is grounded in Bayesian inference and gradient-based optimisation. Deep reinforcement learning (RL) employs related mathematical machinery, iteratively minimising a scalar objective function through backward propagation of information. In this study, we do not propose new algorithms or theoretical connections, but instead provide a transparent and visual illustration of these well-established relationships. Using a compact neural network trained to play the classic Snake game, we track the evolution of all network weights at every training iteration. Short-horizon temporal-difference updates yield frequent local gradient steps on a linearised error signal, closely resembling the inner-loop minimisation of incremental 4D-Var, while experience replay repeatedly recomputes gradients under updated parameters, analogous to outer-loop relinearisation about an evolving reference trajectory. This minimal and fully observable system serves as a controlled laboratory for visualising backward information propagation in optimisation processes familiar to both reinforcement learning and variational data assimilation. The resulting comparison offers an interpretable, pedagogical perspective on reinforcement learning using concepts long established in the data-assimilation literature, without claiming new algorithmic insights or applications.