Abstract
Accurate extrinsic calibration between radar and camera sensors is essential for reliable multi-modal perception in robotics and autonomous navigation. Traditional calibration methods often rely on artificial targets such as checkerboards or corner reflectors, which can be impractical in dynamic or large-scale environments. This study presents a fully targetless calibration framework that estimates the rigid spatial transformation between radar and camera coordinate frames by aligning their observed trajectories of a moving object. The proposed method integrates You Only Look Once version 5 (YOLOv5)-based 3D object localization for the camera stream with Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Random Sample Consensus (RANSAC) filtering for sparse and noisy radar measurements. A passive temporal synchronization technique, based on Root Mean Square Error (RMSE) minimization, corrects timestamp offsets without requiring hardware triggers. Rigid transformation parameters are computed using Kabsch and Umeyama algorithms, ensuring robust alignment even under millimeter-wave (mmWave) radar sparsity and measurement bias. The framework is experimentally validated in an indoor OptiTrack-equipped laboratory using a Skydio 2 drone as the dynamic target. Results demonstrate sub-degree rotational accuracy and decimeter-level translational error (approximately 0.12-0.27 m depending on the metric), with successful generalization to unseen motion trajectories. The findings highlight the method's applicability for real-world autonomous systems requiring practical, markerless multi-sensor calibration.