Abstract
This paper introduces a cost-effective autonomous charging docking system that utilizes a monocular camera and ArUco markers. Traditional monocular vision-based approaches, such as SolvePnP, are sensitive to viewing angles, lighting conditions, and camera calibration errors, limiting the accuracy of spatial estimation. To address these challenges, we propose a regression-based method that learns geometric features from variations in marker size and shape to estimate distance and orientation accurately. The proposed model is trained using ground-truth data collected from a LiDAR sensor, while real-time operation is performed using only monocular input. Experimental results show that the proposed system achieves a mean distance error of 1.18 cm and a mean orientation error of 3.11°, significantly outperforming SolvePnP, which exhibits errors of 58.54 cm and 6.64°, respectively. In real-world docking tests, the system achieves a final average docking position error of 2 cm and an orientation error of 3.07°, demonstrating that reliable and accurate performance can be attained using low-cost, vision-only hardware. This system offers a practical and scalable solution for industrial applications.