Journal Home > Volume 37 , Issue 3

We propose a feature-fusion network for pose estimation directly from RGB images without any depth information in this study. First, we introduce a two-stream architecture consisting of segmentation and regression streams. The segmentation stream processes the spatial embedding features and obtains the corresponding image crop. These features are further coupled with the image crop in the fusion network. Second, we use an efficient perspective-n-point (E-PnP) algorithm in the regression stream to extract robust spatial features between 3D and 2D keypoints. Finally, we perform iterative refinement with an end-to-end mechanism to improve the estimation performance. We conduct experiments on two public datasets of YCB-Video and the challenging Occluded-LineMOD. The results show that our method outperforms state-of-the-art approaches in both the speed and the accuracy.

File
jcst-37-3-719-Highlights.pdf (2.1 MB)
Publication history
Copyright

Publication history

Received: 22 January 2021
Accepted: 31 August 2021
Published: 31 May 2022
Issue date: May 2022

Copyright

©Institute of Computing Technology, Chinese Academy of Sciences 2022
Return