Articulated Signed Distance Functions (A-SDF)

Mar 2022

Wufei Ma
Purdue University

Abstract

Paper reading notes for A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation [1].

In this work, the authors proposed articulated signed distance functions (A-SDF), a differentiable category-level articulated object representation, which can reconstruct and predict the 3D shape under different articulations. They demonstrated that the proposed method can control the articulation input and animate unseen instances with unseen joint angles. Furthermore, they proposed a Test-Time Adaptation inference algorithm to adjust the model during inference.

Articulated Signed Distance Function (A-SDF)

The model takes sampled 3D point locations, shape codes, and articulation codes as inputs, and outputs SDF values (signed distance) that measure the distance of a point to the closest surface point.

Formulation. Consider a training set of $N$ instances models for one object category and each instance is articulated into $M$ poses, leading to a training set of $N \times M$ shapes of the category. Each shape $\mathcal{X}_{n, m}$ is assigned with a shape code $\phi_n \in \mathbb{R}^C$ and an articulation code $\psi_m \in \mathbb{R}^D$. The articulated signed distance function is implemented with an auto-encoder with a shape encoder $f_s$ and an articulation network $f_a$: \[ f_\theta(x, \phi, \psi) = f_a[f_s(x, \phi), x, \psi] = s \]

Training. Let $K$ be the number of sampled points per shape. The training loss is given by \[ \mathcal{L}^s(\mathcal{X}, \phi, \psi) = \frac{1}{K} \sum_{k=1}^K \lVert f_\theta(x_k, \phi, \psi) - s_k \rVert_1 \] A zero-mean multivariate Gaussian prior per shape latent code $\phi$ is used to facilitate learning a continuous shape manifold. \[ \mathcal{L}(\mathcal{X}, \phi, \psi) = \mathcal{L}^s(\mathcal{X}, \phi, \psi) + \lambda_\phi \cdot \lVert \phi \rVert^2 \]

Baisc inference. Given an instance $\mathcal{X}$ we can inference the shape and articulation codes with back-propagation. \[ \arg\min_{\phi, \psi} \mathcal{L}(\mathcal{X}, \phi, \psi) \] In practice, the articulation code usually converges to a good estimate but the shape codes tend to be noisy outputs. A second optimization is adopted by fixing the estimated $\psi$.

Test-Time Adaptation inference. Fixing the network parameters for out-of-distribution instances could be problematic. We fix the articulation network and update the shape encoder as follows \[ \hat{f}_s = \arg\min_{f_s} \mathcal{L}(\mathcal{X}, \hat{\phi}, \hat{\psi}) \]

Results

Quantiative and qualitative results. Also see the project page.

References

[1] J. Mu, W. Qiu, A. Kortylewski, A. Yuille, N. Vasconcelos, X. Wang. A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation. In ICCV, 2021.