Abstract
While deep learning-based visual SLAM (VSLAM) has achieved remarkable localization accuracy, its high computational cost and latency remain critical bottlenecks for real-time deployment. To address these limitations, this paper presents NeuroFusion-SLAM, a novel multi-sensor fusion framework tailored for both efficiency and robustness. By incorporating depthwise separable convolution, the framework cuts down model parameters by approximately 40% and training time by 49% while preserving localization accuracy, thus boosting real-time inference performance and computational efficiency in large-scale environments. Furthermore, a global edge optimization strategy is proposed by integrating sliding window optimization with a factor graph framework, which effectively improves the global consistency of the system. Extensive experiments on the TUM-VI and KITTI-360 datasets demonstrate that our system achieves real-time performance with an average latency of 30.4 ms per frame. It runs 3× faster than ORB-SLAM2 and 4× faster than VINS-Mono, while maintaining good localization accuracy.