Abstract
In the research field of multi-robot cooperation, reliable and low-cost motion capture is crucial for system development and validation. To address the high costs of traditional motion capture systems, this study proposes a real-time 6D pose estimation and tracking method for multi-robot systems based on YolPnP-FT. Using only an Intel RealSense D435i depth camera, the system achieves simultaneous robot classification, 6D pose estimation, and multi-target tracking in real-world environments. The YolPnP-FT pipeline introduces a keypoint confidence filtering strategy (PnP-FT) at the output of the YOLOv8 detection head and employs Gaussian-penalized Soft-NMS to enhance robustness under partial occlusion. Based on these detection results, a linearly weighted combination of Mahalanobis distance and cosine distance enables stable ID assignment in visually similar multi-robot scenarios. Experimental results show that, at a camera height below 2.5 m, the system achieves an average position error of less than 0.009 m and an average angular error of less than 4.2°, with a stable tracking frame rate of 19.8 FPS at 1920 × 1080 resolution. Furthermore, the perception outputs are validated in a CoppeliaSim-based simulation environment, confirming their utility for downstream coordination tasks. These results demonstrate that the proposed method provides a low-cost, real-time, and deployable perception solution for multi-robot systems.