Abstract
Object detection plays a significant role in various industrial and scientific domains, particularly in autonomous driving. It enables vehicles to detect surrounding objects, construct spatial maps, and facilitate safe navigation. To accomplish these tasks, a variety of sensors have been employed, including LiDAR, radar, RGB cameras, and ultrasonic sensors. Among these, LiDAR and RGB cameras are frequently utilized due to their advantages. RGB cameras offer high-resolution images with rich color and texture information but tend to underperform in low light or adverse weather conditions. In contrast, LiDAR provides precise 3D geometric data irrespective of lighting conditions, although it lacks the high spatial resolution of cameras. Recently, thermal cameras have gained significant attention in both standalone applications and in combination with RGB cameras. They offer strong perception capabilities under low-visibility conditions or adverse weather conditions. Multimodal sensor fusion effectively overcomes individual sensor limitations. In this paper, we propose a novel multimodal fusion method that integrates LiDAR, a 360 RGB camera, and a 360 thermal camera to fully leverage the strengths of each modality. Our method employs a feature-level fusion strategy that temporally accumulates and synchronizes multiple LiDAR frames. This design not only improves the detection accuracy but also enhances the spatial coverage and robustness. The use of 360 images significantly reduces blind spots and provides comprehensive environmental awareness, which is especially beneficial in complex or dynamic scenes.