Abstract
With the growing demand for autonomous robotic operations in complex and unstructured environments, traditional semantic SLAM systems-which rely on closed-set semantic vocabularies-are increasingly limited in their ability to robustly perceive and understand diverse and dynamic scenes. This paper focuses on the paradigm shift toward open-world semantic scene understanding in SLAM and provides a comprehensive review of the technological evolution from closed-world assumptions to open-world frameworks. We survey the current state of research in open-world semantic SLAM, highlighting key challenges and frontiers. In particular, we conduct an in-depth analysis of three critical areas: zero-shot open-vocabulary understanding, dynamic semantic expansion, and multimodal semantic fusion. These capabilities are examined for their crucial roles in unknown class identification, incremental semantic updates, and multisensor perceptual integration. Our main contribution is presenting the first systematic algorithmic benchmarking and performance comparison of representative open-world semantic SLAM systems, revealing the potential of these core techniques to enhance semantic understanding in complex environments. Finally, we propose several promising directions for future research, including lightweight model deployment, real-time performance optimization, and collaborative multimodal perception, and offering a systematic reference and methodological guidance for continued advancements in this emerging field.