Abstract
Learning relies on multiple cognitive mechanisms-including reinforcement learning (RL) and working memory (WM)-forming internal value representations guiding choice. However, it remains unclear how these values are transformed into choices, and how this transformation relates to RL and WM. We analyzed electroencephalography (EEG) data from 510 participants performing the RLWM task. An RLWM-linear ballistic accumulator (RLWM-LBA) model was applied, linking RL- and WM-derived policy estimates to evidence-accumulation dynamics. With model-derived event-related potential (ERP) analyses, we tested whether neural signatures of RL and WM persist when accounting for choice dynamics, and whether a neural evidence accumulation signal-the centro-parietal positivity (CPP)-emerges in a learning context. Our findings replicate distinct neural correlates for RL and WM and reveal a CPP signal reflecting uncertainty in learned value representations. CPP signals improve model fit and are differentially linked to RL and WM across cognitive load, supporting their role in shaping learning-related decision dynamics.