Abstract
Balancing stability and flexibility is a fundamental challenge in value-based learning: how does the brain maintain long-term value memories while adapting to new environmental contingencies? To address this, we propose a reinforcement learning model composed of two distinct processes with fast and slow dynamics for updating and forgetting object values. Using a combined theoretical and experimental approach in male macaque monkeys, we validate a key behavioral prediction of this two-rate system-spontaneous recovery of prior value memories following value reversal. At the neural level, we show that single neurons in the ventrolateral prefrontal cortex (vlPFC) temporally multiplex these dynamics, with distinct firing components reflecting fast and slow learning processes. Together, these findings suggest that reward learning and memory are supported by a two-rate system that enables both flexibility and stability, and identify the vlPFC as a critical neural substrate for this mechanism.