Abstract
Monocular depth estimation is one of the key tasks in autonomous driving, which derives depth information of the scene from a single image. And it is a fundamental component for vehicle decision-making and perception. However, approaches currently face challenges such as visual artifacts, scale ambiguity and occlusion handling. These limitations lead to suboptimal performance in complex environments, reducing model efficiency and generalization and hindering their broader use in autonomous driving and other applications. To solve these challenges, this paper introduces a Neural Radiance Field (NeRF)-based monocular depth estimation method for autonomous driving. It introduces a Gaussian probability-based ray sampling strategy to effectively solve the problem of massive sampling points in large complex scenes and reduce computational costs. To improve generalization, a lightweight spherical network incorporating a fine-grained adaptive channel attention mechanism is designed to capture detailed pixel-level features. These features are subsequently mapped to 3D spatial sampling locations, resulting in diverse and expressive point representations for improving the generalizability of the NeRF model. Our approach exhibits remarkable performance on the KITTI benchmark, surpassing traditional methods in depth estimation tasks. This work contributes significant technical advancements for practical monocular depth estimation in autonomous driving applications.