Abstract
Widespread acceptance of collaborative robots in human-involved scenarios requires accessible and intuitive interfaces for lay workers and non-expert users. Existing interfaces often rely on users to plan and issue low-level commands, necessitating extensive knowledge of robot control. This study proposes a multimodal agentic AI framework integrating natural user interfaces (NUIs) to foster effortless human-like partnerships in human-robot collaboration (HRC), which enhance intuitiveness and operational efficiency. First, it allows users to instruct robots using plain language verbally, coupled with gaze, revealing objects precisely. Second, it offloads users' workload for robot motion planning by understanding context and reasoning task decomposition. Third, coordinating with AI agents built on large language models (LLMs), the system interprets users' requests effectively and provides feedback to establish transparent communication. This proof-of-concept study included experiments to demonstrate a practical implementation of the agentic AI framework on a mobile manipulation robot in the collaborative task of human-robot wood assembly. Seven participants were recruited to interact with this AI-integrated agentic robotic system. Task performance and user experience metrics were measured in terms of completion time, intervention rate, NASA TLX survey for workload, and valuable insights of practical applications were summarized through a qualitative analysis. This study highlights the potential of NUIs and agentic AI-embodied robots to overcome existing HRC barriers and contributes to improving HRC intuitiveness and efficiency.