Abstract
Interactive teleoperation offers an intuitive pathway for human-robot interaction, yet many existing systems rely on dedicated sensors or wearable devices, limiting accessibility and scalability. This paper presents a vision-based teleoperation framework that enables real-time control of an articulated robotic arm (five joints plus a gripper actuator) using human hand tracking from a single, typical laptop camera. Hand pose and gesture information are extracted using a real-time landmark estimation pipeline, and a set of compact kinematic descriptors-palm position, apparent hand scale, wrist rotation, hand pitch, and pinch gesture-are mapped to robotic joint commands through a calibration-based control strategy. Commands are transmitted over a lightweight network interface to an embedded controller that executes synchronized servo actuation. To enhance stability and usability, temporal smoothing and rate-limited updates are employed to mitigate jitter while preserving responsiveness. In a human-in-the-loop evaluation with 42 participants, the system achieved an 88% success rate (37/42), with a completion time of 53.48 ± 18.51 s, a placement error of 6.73 ± 3.11 cm for successful trials (n = 37), and an ease-of-use score of 2.67 ± 1.20 on a 1-5 scale. Results indicate that the proposed approach enables feasible interactive teleoperation without specialized hardware, supporting its potential as a low-cost platform for robotic manipulation, education, and rapid prototyping.