Decision making for large-scale multi-armed bandit problems using bias control of chaotic temporal waveforms in semiconductor lasers

利用半导体激光器中混沌时域波形的偏置控制解决大规模多臂老虎机问题

阅读:1

Abstract

Decision making using photonic technologies has been intensively researched for solving the multi-armed bandit problem, which is fundamental to reinforcement learning. However, these technologies are yet to be extended to large-scale multi-armed bandit problems. In this study, we conduct a numerical investigation of decision making to solve large-scale multi-armed bandit problems by controlling the biases of chaotic temporal waveforms generated in semiconductor lasers with optical feedback. We generate chaotic temporal waveforms using the semiconductor lasers, and each waveform is assigned to a slot machine (or choice) in the multi-armed bandit problem. The biases in the amplitudes of the chaotic waveforms are adjusted based on rewards using the tug-of-war method. Subsequently, the slot machine that yields the maximum-amplitude chaotic temporal waveform with bias is selected. The scaling properties of the correct decision-making process are examined by increasing the number of slot machines to 1024, and the scaling exponent of the power-law distribution is 0.97. We demonstrate that the proposed method outperforms existing software algorithms in terms of the scaling exponent. This result paves the way for photonic decision making in large-scale multi-armed bandit problems using photonic accelerators.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。