Abstract
6DoF pose estimation is one of the key technologies for robotic grasping. Due to the lack of texture, most existing 6DoF pose estimation methods perform poorly on transparent objects. In this work, a hierarchical feature fusion network, HFF6DoF, is proposed for 6DoF pose estimation of transparent objects. In HFF6DoF, appearance and geometry features are extracted from RGB-D images with a dual-branch network, and are hierarchically fused for information aggregation. A decoding module is introduced for semantic segmentation and keypoint vector-field prediction. Based on the results of semantic segmentation and keypoint prediction, 6DoF poses of transparent objects are calculated by using Random Sample Consensus (RANSAC) and Least-Squares Fitting. In addition, a new transparent-object 6DoF pose estimation dataset, TDoF20, is constructed, which consists of 61,886 pairs of RGB and depth images covering 20 types of objects. The experimental results show that the proposed HFF6DoF outperforms state-of-the-art approaches on the TDoF20 dataset by a large margin, achieving an average ADD of 50.5%.