Apparent learning biases emerge from optimal inference: Insights from master equation analysis

表观学习偏差源于最优推理:来自主方程分析的启示

阅读:1

Abstract

Recent studies [S. Palminteri, G. Lefebvre, E. J. Kilford, S. J. Blakemore, PLoS Comput. Biol. 13, e1005684 (2017); G. Lefebvre, M. Lebreton, F. Meyniel, S. Bourgeois-Gironde, S. Palminteri, Nat. Hum. Behav. 1, 0067 (2017).] among others claim that human behavior in a two armed Bernoulli bandit task is described by positivity and confirmation bias, thereby implying [S. Palminteri, M. Lebreton, Trends Cogn. Sci. 26, 607-621 (2022).] that "Humans do not integrate new information objectively." The claim is based on fitting to human data a Q-learning model with different (and temporally constant) learning rates for positive and negative reward prediction errors. However, we find that even if the agent updates its belief via, arguably objective, Bayesian inference, fitting the above model demonstrates both the biases. This finding seems particularly surprising, as Bayesian inference, when written as an effective Q-learning algorithm, is described by unbiased (and temporally decreasing) learning rates. In this article, we explain the reasons behind this observation, by studying the stochastic dynamics of these learning systems using Master equations. In particular, we show that both confirmation bias and unbiased but temporally decreasing learning rates have the same behavioral signature: decreased action switching probabilities, as compared to temporally constant and unbiased learning rates. Our analysis underscores the need for modeling temporally varying learning rates in subjects before any claims can be made about their choices being biased.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。