Normalized Object Coordinate Space (NOCS)

Jan 2022

Wufei Ma
Purdue University

Abstract

Paper reading notes for Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation [1].

The goal of this paper is to estimate the 6D pose and dimensions of unseen object instances in an RGB-D image. To handle different and unseen object instances in a given category, they introduced Normalized Object Coordinate Space (NOCS), a shared canonical representation for all possible object instances in a category. To further improve the model and evaluate its performance on real data, they also provided a fully annotated real-world dataset with large environment and instance variation.

Normalized Object Coordinate Space (NOCS)

NOCS. Category-level 6D object pose and size estimation predicts a tight oriented bounding box around an object. Since no exact CAD models are available for category-level tasks, the first challenge is to find a representation that allows definition of 6D pose and size for different objects. The NOCS is defined as a 3D space contained within a unit cube $\{x, y, z\} \in [0, 1]$. Known object CAD models for each category are normalized such that the diagonal of its tight bounding box has a length of 1 and is centered. The object center and orientation is also aligned across the same category. The CNN then predicts the 2D perspective projection of the color-coded NOCS coordinates, i.e., a NOCS map.

Method. The proposed method consists of a Mask R-CNN that estimates the class label, instance mask, and the NOCS mask from the RGB image and a pose fitting algorithm. Three heads are added to the Mask R-CNN architecture to predict the $x, y, z$ components of the NOCS maps. For the NOCS head, a standard softmax loss function is used for classification and a soft L1 loss is added to make the learning more robust. \[ \mathcal{L}(\mathbf{y}, \mathbf{y}') = \begin{cases} \frac{1}{n}5(\mathbf{y} - \mathbf{y}^*)^2 & \lvert \mathbf{y} - \mathbf{y}^*\rvert \leq 0.1 \\ \frac{1}{n}(\lvert \mathbf{y} - \mathbf{y}^*\rvert - 0.05) & \lvert \mathbf{y} - \mathbf{y}^*\rvert > 0.1 \end{cases} \] For object categories that are symmetric about an axis, e.g., bottle, we can define an axis of symmetry and extend the loss function by setting $\mathcal{L}_\text{s} = \min_{i=1, \dots, \lvert \theta \rvert} \mathcal{L}$. Often $\lvert \theta \rvert \leq 6$ is enough.

Pose fitting. The goal is to estimate the full metric 6D pose and dimensions of detected objects. We first obtain a 3D point cloud $P_m$ of the object from the predicted object mask. Also, we can use the NOCS map to obtain a 3D representation $P_n$. We then estimate the scales, rotations, and translation that transforms the $P_n$ to $P_m$. Umeyama algorithm and RANSAC is used for this 7 dimensional rigid transformation estimation.

Datasets

Context-Aware Mixed Reality Approach. To facilitate the generation of large amounts of training data with ground truth for hand-scale objects, the authors proposed a new Context-Aware MixEd ReAlity (CAMERA) approach. It combines real background image with synthetically rendered foreground objects in a context-aware manner. Real RGB-D images of 31 widely vaying indoor scenes are used as background. Objects from 6 categories are placed where they would naturally occur. A plane detection algorithm is used to obtain pixel-level plane segmentation and severl virtual light sources are added to mimic real indoor lighting conditions. In total, 300K composited images are rendered and 25K are set aside for validation.

Results

Metrics. The authors used IoU for 3D object detection and AP for 6D pose estimation, given a translation threshold and a rotation threshold.

Baseline. Since no other methods for category-level 6D pose and size estimation are known, the authors used the same model but without the NOCS branch as the baseline.

3D detection and 6D pose estimation results on CAMERA25.

Results on REAL275 dataset.

Quantitative results on REAL275 test set.

References

[1] H. Wang, S. Sridhar, J. Huang, J. Valentin, S. Song, L. Guibas. Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation. In CVPR, 2019.