Abstract
Road object segmentation is crucial for autonomous driving, as it enables vehicles to perceive their surroundings. While deep learning models show promise, their generalization across diverse road conditions, weather variations, and lighting changes remains challenging. Different approaches have been proposed to address this limitation. However, these models often struggle with the varying appearance of road objects under diverse environmental conditions. Foundation models such as the Segment Anything Model (SAM) offer a potential avenue for improved generalization in complex visual tasks. Thus, this study presents a pioneering comprehensive evaluation of the SAM for zero-shot road object segmentation, without explicit prompts. This study aimed to determine the inherent capabilities and limitations of the SAM in accurately segmenting a variety of road objects under the diverse and challenging environmental conditions encountered in real-world autonomous driving scenarios. We assessed the SAM's performance on the KITTI, BDD100K, and Mapillary Vistas datasets, encompassing a wide range of environmental conditions. Using a variety of established evaluation metrics, our analysis revealed the SAM's capabilities and limitations in accurately segmenting various road objects, particularly highlighting challenges posed by dynamic environments, illumination changes, and occlusions. These findings provide valuable insights for researchers and developers seeking to enhance the robustness of foundation models such as the SAM in complex road environments, guiding future efforts to improve perception systems for autonomous driving.