ToggleMimic: A Two-Stage Policy for Text-Driven Humanoid Whole-Body Control

ToggleMimic:一种基于文本驱动的人形全身控制的两阶段策略

阅读:1

Abstract

For humanoid robots to interact naturally with humans and seamlessly integrate into daily life, natural language serves as an essential communication medium. While recent advances in imitation learning have enabled robots to acquire complex motions through expert demonstration, traditional approaches often rely on rigid task specifications or single-modal inputs, limiting their ability to interpret high-level semantic instructions (e.g., natural language commands) or dynamically switch between actions. Directly translating natural language into executable control commands remains a significant challenge. To address this, we propose ToggleMimic, an end-to-end imitation learning framework that generates robotic motions from textual instructions, enabling language-driven multi-task control. In contrast to end-to-end methods that struggle with generalization or single-action models that lack flexibility, our ToggleMimic framework uniquely combines the following: (1) a two-stage policy distillation that efficiently bridges the sim-to-real gap, (2) a lightweight cross-attention mechanism for interpretable text-to-action mapping, and (3) a gating network that enhances robustness to linguistic variations. Extensive simulation and real-world experiments demonstrate the framework's effectiveness, generalization capability, and robust text-guided control performance. This work establishes an efficient, interpretable, and scalable learning paradigm for cross-modal semantic-driven autonomous robot control.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。