Perceptual GAN

Perceptual GANs

Nov 2021

Wufei Ma
Purdue University

Abstract

Paper reading notes for Perceptual generative adversarial networks for small object detection [1].

Previous object detection pipelines usually detect small objects through learning representations of all the objects at multiple scales, which are limited to pay off the computational cost. In this work, the authors proposed a new Perceptual Generative Adversarial Network (Perceptual GAN) model that improves small object detection through narrowing representation difference of small objects from the large ones.

Perceptual GANs

The generator is a deep residual based feature generative model which transforms the original poor features of small objects to highly discriminative ones by introducing fine-grained details from lower-level layers, achieving "super-resolution" on the intermediate representations. The discriminator serves a supervisor and provides guidance on the quality and advantages of the generated fine-grained details. The Perceptual GAN also includes a new perceptual loss tailored for the detection purpose.

Perceptual GANs. Let $F_l$ and $F_s$ be representations for large and small objects respectively. We aim to learn a generator function $G$ that transforms the representations of a small object $F_s$ to a super-resolved one $G(F_s)$ that is similar to the original one of the large object $F_l$. A new conditional generator model is introduced to generate residual representation between large and small objects condition conditioned on the extra auxiliary information, i.e. the low-level features of the small object $f$. \[ \min_G \max_D L(D, G) \triangleq \mathbb{E}_{F_l \sim p_{data}(F_l)} \log D(F_l) + \mathbb{E}_{F_s \sim p_{F_s}(F_s \mid F)} [\log(1 - D(F_s + G(F_s \mid f)))] \] The generator $G_{\Theta_G}$ is obtained by optimizing the loss function $L_{dis}$ \[ \Theta_G = \arg \min_{\Theta_G} L_{dis}(G_{\Theta_G}(F_s)) \] The adversarial branch of the discriminator $D_{\Theta_a}$ is obtained by optimizing the loss fucntion $L_a$ \[ \begin{align*}\Theta_a & = \arg \min_{\Theta_a} L_a(G_{\Theta_g}(F_s), F_l) \\ L_a & = -\log D_{\Theta_a}(F_l) - \log(1 - D_{\Theta_a}(G_{\Theta_g}(F_s)))\end{align*} \] The perception branch of the discriminator $D_{\Theta_p}$ is obtained by optimizing the loss function $L_{dis\_p}$ \[ \Theta_p = \arg \min_{\Theta_p} L_{dis\_p}(F_l) \]

Adversarial loss. An adversarial loss is introduced to encourage the generator network to produce the super-resolved representation for small object similar as that of the large object. \[ L_{dis\_a} = -\log D_{\Theta_a}(G_{\Theta_g}(F_s)) \]

Perceptual loss. The multi-task loss $L_{dis\_p}$ is computed to justify the detection accuracy benefiting from the generated super-resolved features for each object proposal: \[ L_{dis\_p} = L_{cls}(p,g) + \mathbf{1}[g \geq 1] L_{loc}(r_g, r^*) \] where $L_{cls}(p, g) = -\log p_g$ and $L_{loc}$ is a smooth $L_1$ loss.

References

[1] J. Li, X. Liang, Y. Wei, T. Xu, J. Feng, and S. Yan. Perceptual generative adversarial networks for small object detection. In CVPR, 2017.