Offline reinforcement learning combining generalized advantage estimation and modality decomposition interaction

结合广义优势估计和模态分解交互的离线强化学习

阅读:1

Abstract

Transformers show great potential in offline reinforcement learning via trajectory sequence modeling for action prediction. However, existing Transformer-based methods face limitations, such as ineffective trajectory stitching and the neglect of deep interactions within and between multimodal information in trajectories. We propose CGM, an offline reinforcement learning approach that combines Generalized Advantage Estimation with Modality Decomposition Interaction (MDI) to address these challenges. Generalized Advantage Estimation relabels the dataset to enhance trajectory stitching effectiveness. MDI consists of an encoder and a decoder. The encoder integrates an intra-modal interaction mechanism based on ConvFormer and an inter-modal interaction mechanism based on a dual-Transformer architecture to enable information exchange within and across modalities. In intra-modal interaction, the convolutional properties of ConvFormer effectively capture the associative information within respective modalities of states and actions. In inter-modal interaction, the dual-Transformer architecture facilitates multimodal information exchange for states and actions separately, fully exploring potential correlations between different modal data to achieve deep cross-modal information interaction. The decoder utilizes advantage values to optimize action prediction. We compared CGM with state-of-the-art baseline methods on the D4RL dataset. On the MuJoCo dataset, our proposed method outperforms the optimal comparison method by 2.89% in performance.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。