Instance- and Category-level 6D Object Pose Estimation

Dec, 2021

Wufei Ma
Purdue University

Abstract

Paper reading notes for Instance- and Category-level 6D Object Pose Estimation [1].

In this work, the authors reviewed recent works on instance- and categorical-level 6D object pose estimation from RGB and RGB-D data.

6D Object Pose Estimation

Instance-level 6D object pose estimation estimate 6D poses of seen objects, mainly targeting to report improved results overcoming instances' challenges, such as viewpoint variability, occlusion, clutter, and similar-looking distractors. However, instance-based methods cannot easily be generalized for category-level 6D object pose estimation, which involves the challenges such as distribution shift among source and target domains, high intra-class variations, and shape discrepencies between objects.

The authors formulated the instance-level 6D pose estimation as follows: given an RGB-D image $I$ where an instance $S$ of the interested object $O$ exists, we estimate the 3D translation $\mathbf{x}=(x, y, z)$ and the 3D rotation $\theta = (r, p, y)$ as \[ (\mathbf{x}, \theta)^* = \arg\max_{\mathbf{x}, \theta} p(\mathbf{x}, \theta \mid I, S) \] Extensions includes other input formats, such as multiple instances $\mathcal{S}$, an instance $C$ from a category $c$, and multiple instances $\mathcal{C}$ from a category $c$.

Challenges

The authors categorized the challenges in 6D object pose estimation to instance-level and category-level.

Challenges of instances:

Viewpoint variability.
Texture-less objects.
Occlusion.
Clutter.
Similar-looking distractors.

Challenges of categories:

Intra-class variation.
Distribution shift.

Methods

Instance-based methods.

Templated-based.
Point-to-point.
Conventional learning-based.
Deep learning.

Category-based methods.

2D.
3D. Methods for 3D object detection focus on finding the bounding volume of objects rather than 6D pose.
4D.
6D. Intrinsic Structure Adaptors (ISA) [2], a part-based random forest architecture, for full 6D object pose estimation at the level of categories in depth images.

Evaluation Metrics

Average distance (AD). Given groundtruth pose $(\bar{\mathbf{x}}, \bar{\theta})$ and estimated pose $(\mathbf{x}, \theta)$, the average distance $\omega_\text{AD}$ is calculated over all points $\mathbf{s}$ of the 3D model $M$ of the object of interest: \[ \omega_\text{AD} = \text{avg}_{\mathbf{s} \in M} \lVert (\bar{R}\mathbf{s} + \bar{T}) - (R\mathbf{s}+T)\rVert \] A hypothesis is considered correct if \[ \omega_\text{AD} \leq z_\omega \Phi \] where $\Phi$ is the diameter of the 3D model $M$ and $z_\omega$ is a threshold.

Interaction over Union (IoU). Given the estimated and groundtruth bounding boxes $B$ and $\bar{B}$, the IoU $\omega_\text{IoU}$ is given by \[ \omega_\text{IoU} = \frac{B \cap \bar{B}}{B \cup \bar{B}} \] and a threshold is often set as $\tau_\text{IoU} = 0.5$. IoU of 3D bounding boxes are often aligned with gravity direction, but makes no assumption on the other two axes.

References

[1] C. Sahin, G. Garcia-Hernando, J. Sock, and T. Kim. Instance- and Category-level 6D Object Pose Estimation. In RGB-D Image Analysis and Processing, 2019.

[2] C. Sahin and T. Kim. Category-level 6D Object Pose Recovery in Depth Images. In ECCVW, 2018.