WS-SfMLearner: self-supervised monocular depth and ego-motion estimation on surgical videos with unknown camera parameters

WS-SfMLearner:基于未知相机参数的手术视频自监督单目深度和自我运动估计

阅读:1

Abstract

PURPOSE: Accurate depth estimation in surgical videos is a pivotal component of numerous image-guided surgery procedures. However, creating ground truth depth maps for surgical videos is often infeasible due to challenges such as inconsistent illumination and sensor noise. As a result, self-supervised depth and ego-motion estimation frameworks are gaining traction, eliminating the need for manually annotated depth maps. Despite the progress, current self-supervised methods still rely on known camera intrinsic parameters, which are frequently unavailable or unrecorded in surgical environments. We address this gap by introducing a self-supervised system capable of jointly predicting depth maps, camera poses, and intrinsic parameters, providing a comprehensive solution for depth estimation under such constraints. APPROACH: We developed a self-supervised depth and ego-motion estimation framework, incorporating a cost volume-based auxiliary supervision module. This module provides additional supervision for predicting camera intrinsic parameters, allowing for robust estimation even without predefined intrinsics. The system was rigorously evaluated on a public dataset to assess its effectiveness in simultaneously predicting depth, camera pose, and intrinsic parameters. RESULTS: The experimental results demonstrated that the proposed method significantly improved the accuracy of ego-motion and depth prediction, even when compared with methods incorporating known camera intrinsics. In addition, by integrating our cost volume-based supervision, the accuracy of camera parameter estimation, including intrinsic parameters, was further enhanced. CONCLUSIONS: We present a self-supervised system for depth, ego-motion, and intrinsic parameter estimation, effectively overcoming the limitations imposed by unknown or missing camera intrinsics. The experimental results confirm that the proposed method outperforms the baseline techniques, offering a robust solution for depth estimation in complex surgical video scenarios, with broader implications for improving image-guided surgery systems.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。