Abstract
This paper presents a modular late-fusion framework that integrates Camera, LiDAR, and Radar modalities for object classification in autonomous driving. Rather than relying on complex end-to-end fusion architectures, we train two lightweight yet complementary neural networks independently: a CNN for Camera + LiDAR using KITTI, and a GRU-based radar classifier trained on RadarScenes. A unified 5-class label space is constructed to align the heterogeneous datasets, and we verify its validity through class-distribution analysis. The fusion rule is formally defined using a confidence-weighted decision mechanism. To ensure statistical rigor, we conduct 3-fold cross-validation with three random seeds, reporting mean and standard deviation of mAP and per-class AP. Results show that the Camera + LiDAR model achieves a strong average mAP of 95.34%, while Radar achieves 33.89%, reflecting its robustness but lower granularity. Using the proposed late-fusion rule, performance increases to 94.97% mAP versus KITTI ground truth and 33.74% versus RadarScenes. Cross-validated per-class trends confirm complementary sensing: Camera + LiDAR excels at Cars, Bicycles, and Pedestrians, while Radar contributes stability under adverse conditions. The paper also provides a complexity and latency analysis, discusses dataset limitations, clarifies temporal handling for radar, and includes updated literature up to 2025. Findings show that lightweight late fusion can achieve high reliability while remaining computationally efficient, making it suitable for real-time embedded autonomous driving systems.