Abstract
Targetless LiDAR-camera extrinsic calibration remains challenging due to unreliable cross-modal correspondences and sensitivity to initialization. We present a targetless extrinsic calibration framework based on class-agnostic boundary mask alignment in a shared image-plane representation. This scheme first constructs consistent LiDAR-camera mask pairs from image-plane depth and intensity projections of LiDAR data and camera images. It then obtains robust initial pose candidates through bounded rotation-only global initialization and refines them using a computationally efficient stochastic gradient approximation to estimate the optimal extrinsic parameters. Experiments on the KITTI benchmark demonstrate a superior accuracy-runtime trade-off compared with a segmentation-based global optimization baseline, while real-world driving tests confirm stable cross-modal alignment under vibration and inter-modal timing jitter.