Distributed representations of temporally accumulated reward prediction errors in the mouse cortex.

Reward prediction errors (RPEs) quantify the difference between expected and actual rewards, serving to refine future actions. Although reinforcement learning (RL) provides ample theoretical evidence suggesting that the long-term accumulation of these error signals improves learning efficiency, it remains unclear whether the brain uses similar mechanisms. To explore this, we constructed RL-based theoretical models and used multiregional two-photon calcium imaging in the mouse dorsal cortex. We identified a population of neurons whose activity was modulated by varying degrees of RPE accumulation. Consequently, RPE-encoding neurons were sequentially activated within each trial, forming a distributed assembly. RPE representations in mice aligned with theoretical predictions of RL, emerging during learning and being subject to manipulations of the reward function. Interareal comparisons revealed a region-specific code, with higher-order cortical regions exhibiting long-term encoding of RPE accumulation. These results present an additional layer of complexity in cortical RPE computation, potentially augmenting learning efficiency in animals.

期刊：	Science Advances	影响因子：	12.500
时间：	2025	起止号：	2025 Jan 24; 11(4):eadi4782
doi：	10.1126/sciadv.adi4782	种属：	Mouse
研究方向：	其它

Distributed representations of temporally accumulated reward prediction errors in the mouse cortex.

小鼠皮层中时间累积的奖励预测误差的分布式表征

特别声明