Abstract
As autonomous driving technology progresses, LiDAR-based 3D object detection has emerged as a fundamental element of environmental perception systems. PointPillars transforms point cloud data into a two-dimensional pseudo-image and employs a 2D CNN for efficient and precise detection. Nevertheless, this approach encounters two primary challenges: (1) the sparsity and disorganization of raw point clouds hinder the model's capacity to capture local features, thus impacting detection accuracy; and (2) existing models struggle to detect small objects within complex environments, particularly regarding orientation estimation. To address these issues, we propose two enhancements: (1) point-level fusion of LiDAR point clouds and RGB images, which integrates the semantic information of 2D images with the geometric features of 3D point clouds to improve model performance in intricate scenarios; (2) the incorporation of the Efficient Channel Attention mechanism to concentrate on essential features, particularly for small and sparse objects. Experimental results on the KITTI dataset indicate significant improvements, particularly in small object detection tasks, such as identifying pedestrians and cyclists. The enhanced model also demonstrates substantial gains in the Average Orientation Similarity (AOS) metric. These enhancements enhance the vehicle's ability to track and predict object trajectories in dynamic environments, critical for reliable recognition and decision-making.