Wisdom of the silicon crowd: LLM ensemble prediction capabilities rival human crowd accuracy

硅谷群体智慧:LLM集成预测能力可与人类群体预测精度相媲美

阅读:1

Abstract

Human forecasting accuracy improves through the "wisdom of the crowd" effect, in which aggregated predictions tend to outperform individual ones. Past research suggests that individual large language models (LLMs) tend to underperform compared to human crowd aggregates. We simulate a wisdom of the crowd effect with LLMs. Specifically, we use an ensemble of 12 LLMs to make probabilistic predictions about 31 binary questions, comparing them with those made by 925 human forecasters in a 3-month tournament. We show that the LLM crowd outperforms a no-information benchmark and is statistically indistinguishable from the human crowd. We also observe human-like biases, such as the acquiescence bias. In another study, we find that LLM predictions (of GPT-4 and Claude 2) improve when exposed to the median human prediction, increasing accuracy by 17 to 28%. However, simply averaging human and machine forecasts yields more accurate results. Our findings suggest that LLM predictions can rival the human crowd's forecasting accuracy through simple aggregation.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。