Born-Again Neural Networks

Jan 2022

Wufei Ma
Purdue University

Abstract

Paper reading notes for Born-Again Neural Networks [1].

In this work, the authors studied knowledge distillation from a new perspective: rather than compression models, they trained students parameterized identically to their teachers and found that these Born-Again Networks (BANs) outperform their teachers significantly. Experiments with BANs based on DenseNets demonstrate state-of-the-art performance on the CIFAR-10 (3.5%) and CIFAR-100 (15.5%) datasets by validation error. Additional experiments explored two distillation objectives: (i) Confidence-Weighted by Teacher Max (CWTM) and (ii) Dark Knowledge with Permuted Predictions (DKPP).

Born-Again Networks

BANs exploited the idea demonstrated in KD, that the information contained in a teacher model's output distribution $f(x, \theta_1^*)$ can provide a rich source of training signal, leading to a second solution $f(x, \theta_2^*)$, $\theta_2 \in \Theta_2$, with better generalization ability.

Sequence of teaching selves born-again networks ensemble. The $k$-th model of BANs is trained with knowledge transferred from the $k-1$-th student: \[ \mathcal{L}(f(x, \arg \min_{\theta_{k-1}} \mathcal{L}(f(x, \theta_{k-1}))), f(x, \theta_k)) \] Finally, Born-Again Network Ensembles (BANE) predict by averaging the prediction of multiple generations of BANs. \[ \hat{f}^{k}(x) = \sum_{i=1}^k f(x, \theta_i)/k \]

Dark knowledge under the light. The single-sample gradient of the cross-entropy between student logits $z_j$ and teacher logits $t_j$ with respect to the $i$th output is given by: (see here) \[ \frac{\partial \mathcal{L}_i}{\partial z_i} = q_i - p_i = \frac{\exp(z_i)}{\sum_{j=1}^n \exp(z_j)} - \frac{\exp(t_i)}{\sum_{j=1}^n \exp(t_j)} \] When the target probability distribution corresponds to the ground truth one-hot label, this reduces to: \[ \frac{\partial \mathcal{L}_*}{\partial z_*} = q_* - y_* = \frac{\exp(z_*)}{\sum_{j=1}^n \exp(z_j)} - 1 \]

Results

Test error on CIFAR-10.

Test error on CIFAR-100.

References

[1] T. Furlanello, Z. Lipton, M. Tschannen, L. Itti, A. Anandkumar. Born Again Neural Networks. In ICML, 2018.