Articulated Signed Distance Function (A-SDF)
The model takes sampled 3D point locations, shape codes, and articulation codes as inputs, and outputs SDF values (signed distance) that measure the distance of a point to the closest surface point.
Formulation. Consider a training set of $N$ instances models for one object category and each instance is articulated into $M$ poses, leading to a training set of $N \times M$ shapes of the category. Each shape $\mathcal{X}_{n, m}$ is assigned with a shape code $\phi_n \in \mathbb{R}^C$ and an articulation code $\psi_m \in \mathbb{R}^D$. The articulated signed distance function is implemented with an auto-encoder with a shape encoder $f_s$ and an articulation network $f_a$:
\[ f_\theta(x, \phi, \psi) = f_a[f_s(x, \phi), x, \psi] = s \]
Training. Let $K$ be the number of sampled points per shape. The training loss is given by
\[ \mathcal{L}^s(\mathcal{X}, \phi, \psi) = \frac{1}{K} \sum_{k=1}^K \lVert f_\theta(x_k, \phi, \psi) - s_k \rVert_1 \]
A zero-mean multivariate Gaussian prior per shape latent code $\phi$ is used to facilitate learning a continuous shape manifold.
\[ \mathcal{L}(\mathcal{X}, \phi, \psi) = \mathcal{L}^s(\mathcal{X}, \phi, \psi) + \lambda_\phi \cdot \lVert \phi \rVert^2 \]
Baisc inference. Given an instance $\mathcal{X}$ we can inference the shape and articulation codes with back-propagation.
\[ \arg\min_{\phi, \psi} \mathcal{L}(\mathcal{X}, \phi, \psi) \]
In practice, the articulation code usually converges to a good estimate but the shape codes tend to be noisy outputs. A second optimization is adopted by fixing the estimated $\psi$.
Test-Time Adaptation inference. Fixing the network parameters for out-of-distribution instances could be problematic. We fix the articulation network and update the shape encoder as follows
\[ \hat{f}_s = \arg\min_{f_s} \mathcal{L}(\mathcal{X}, \hat{\phi}, \hat{\psi}) \]
|