Knowledge distillation of multi-scale dense prediction transformer for self-supervised depth estimation

基于多尺度密集预测转换器的知识蒸馏及其在自监督深度估计中的应用

阅读:1

Abstract

Depth estimation is an inverse projection problem that estimates pixel-level distances from a single image. Although, supervised methods have shown promising results, it has intrinsic limitations in requiring ground truth depth from an external sensor. On the other hand, self-supervised depth estimation relieves the burden for collecting calibrated training data, while there is still a large performance gap between supervised and self-supervised methods. The objective of this study is to reduce the performance gap between the supervised and self-supervised approaches. The loss function of previous self-supervised methods is mainly based on a photometric error, which is indirectly computed from synthesized images using depth and pose estimates. In this paper, we argue that direct depth cue is more effective to train a depth estimation network. To obtain the direct depth cue, we employed a knowledge distillation technique, which is a teacher-student learning framework. The teacher network was trained in a self-supervised manner based on a photometric error, and its predictions were utilized to train a student network. We constructed a multi-scale dense prediction transformer with Monte Carlo dropout, and multi-scale distillation loss was proposed to train the student network based on the ensemble of stochastic estimates. Experiments were conducted on the KITTI and Make3D datasets, and our proposed method achieved the state-of-the-art accuracy in self-supervised depth estimation. Our code is publicly available at https://github.com/ji-min-song/KD-of-MS-DPT .

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。