Abstract
The 6D pose estimation of objects plays a crucial role in robot sorting technology. To simultaneously improve the accuracy and efficiency of pose estimation during the grasping process, this study proposes an innovative deep learning pose estimation method based on PVN3D. Specifically, this study deeply optimized the backbone network of the image extraction model, using dense connections and learnable grouped convolutions to improve model accuracy and reduce model complexity. At the same time, the core idea of convolutional neural networks is integrated into the point cloud feature extraction process, which uses dynamic convolution techniques to process point cloud features, making the processing of point cloud data more efficient and flexible. In addition, by introducing a parameter-free attention mechanism, the accuracy of the model has been further improved. Through extensive experimental verification on the publicly available LineMOD, YCB video datasets and other three core datasets under the BOP benchmark, the method proposed in this paper demonstrates significant advantages in both computational efficiency and estimation accuracy. It not only significantly improves the accuracy of pose estimation but also significantly enhances real-time performance.