Abstract
Roadside perception systems, also known as roadside units (RSUs), are critical in Vehicle-to-Everything (V2X) applications, yet spatio-temporal asynchrony between multiple sensors severely compromises the accuracy of fusion. In this paper, a spatio-temporal synchronization method for millimeter-wave (MMW) radar and camera fusion is proposed, integrating target matching based on dynamic time warping (DTW) with spatio-temporal parameter estimation. Leveraging the advantages of DTW in time-series alignment to calculate the similarity between radar and visual trajectories enables target matching and parameter estimation in sparse scenes. This method was validated on a real-world dataset containing over 30 pedestrian trajectories, covering scenarios with varying densities ranging from one to six pedestrians. The results indicate a temporal offset of 0.116 s between the camera and radar. Following synchronization, the average spatial deviation decreased from 1.4358 to 0.1074 m in the x-direction (i.e., across the road) and from 3.0732 to 0.1775 m in the y-direction (i.e., along the road). Consequently, this method provides an efficient solution for deploying roadside perception systems in sparse traffic environments.