Statistical information about reward timing is insufficient for promoting optimal persistence decisions

关于奖励时机的统计信息不足以促进最佳坚持性决策。

阅读:1

Abstract

When deciding how long to keep waiting for delayed rewards that will arrive at an uncertain time, different distributions of possible reward times dictate different optimal strategies for maximizing reward. When reward timing distributions are heavy-tailed (e.g., waiting on hold) there is a point at which waiting is no longer advantageous because the opportunity cost of waiting is too high. Alternatively, when reward timing distributions have more predictable timing (e.g., uniform), it is advantageous to wait as long as necessary for the reward. Although people learn to approximate optimal strategies, little is known about how this learning occurs. One possibility is that people learn a general cognitive representation of the probability distribution that governs reward timing and then infer a strategy from that model of the environment. Another possibility is that they learn an action policy in a way that depends more narrowly on direct task experience, such that general knowledge of the reward timing distribution is insufficient for expressing the optimal strategy. Here, in a series of studies in which participants decided how long to persist for delayed rewards before quitting, we provided participants with information about the reward timing distribution in several ways. Whether the information was provided through counterfactual feedback (Study 1), previous exposure (Studies 2a and 2b), or description (Studies 3a and 3b), it did not obviate the need for direct, feedback-driven learning in a decision context. Therefore, learning when to quit waiting for delayed rewards might depend on task-specific experience, not solely on probabilistic reasoning.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。