Deep Mutual Learning

Jan 2022

Wufei Ma
Purdue University

Abstract

Paper reading notes for Deep Mutual Learning [1].

In this work, the authors presented a deep mutual learning (DML) strategy where, rather than one way transfer between a static pre-defined teacher and a student, an ensemble of students learn collaboratively and teach each other throughout the training process. Experiments showed that DML achieved compelling results on CIFAR-100 recognition and Market-1501 person re-identification benchmarks.

Deep Mutual Learning

The conventional supervised loss trains the network $\Theta_1$ to predict the correct labels for the training instances. To improve the generalization performance of $\Theta_1$ on testing instances, we use another peer network $\Theta_2$ to provide training experience in the form of its posterior probability $p_2$. Kullback Leibler (KL) Divergence is used to measure the match of the two network's predictions $p_1$ and $p_2$: \[ D_\text{KL}(p_2 \mid\mid p_1) = \sum_{i=1}^N\sum_{m=1}^M p_2^m(x_1) \log\frac{p_2^m(x_i)}{p_1^m(x_i)} \] The cross entropy error is given by \[ L_{\text{C}, \Theta_1} = - \sum_{i=1}^N \sum_{m=1}^M I(y_i, m) \log(p_1^m(x_i)) \] The overall loss $L_{\Theta_1}$ is then \[ L_{\Theta_1} = L_{\text{C}, \Theta_1} + D_\text{KL}(p_2 \mid\mid p_1) \]

Results

Top-1 accuracy of CIFAR-100 dataset obatined by various architectures.

Comparing with distillation on CIFAR-100.

Ablation study on different number of networks.

References

[1] Y. Zhang, T. Xiang, T. Hospedales, H. Lu. Deep Mutual Learning. In CVPR, 2018.