Journal Home > Volume 1 , Issue 2

Deep learning based semi-supervised learning (SSL) algorithms have led to promising results in recent years. However, they tend to introduce multiple tunable hyper-parameters, making them less practical in real SSL scenarios where the labeled data is scarce for extensive hyper-parameter search. In this paper, we propose a novel meta-learning based SSL algorithm (Meta-Semi) that requires tuning only one additional hyper-parameter, compared with a standard supervised deep learning algorithm, to achieve competitive performance under various conditions of SSL. We start by defining a meta optimization problem that minimizes the loss on labeled data through dynamically reweighting the loss on unlabeled samples, which are associated with soft pseudo labels during training. As the meta problem is computationally intensive to solve directly, we propose an efficient algorithm to dynamically obtain the approximate solutions. We show theoretically that Meta-Semi converges to the stationary point of the loss function on labeled data under mild conditions. Empirically, Meta-Semi outperforms state-of-the-art SSL algorithms significantly on the challenging semi-supervised CIFAR-100 and STL-10 tasks, and achieves competitive performance on CIFAR-10 and SVHN.


menu
Abstract
Full text
Outline
Electronic supplementary material
About this article

Meta-Semi: A Meta-Learning Approach for Semi-Supervised Learning

Show Author's information Yulin Wang1Jiayi Guo1Jiangshan Wang2Cheng Wu1Shiji Song1Gao Huang1( )
Department of Automation, BNRist, Tsinghua University, Beijing 100084, China
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518131, China

Yulin Wang and Jiayi Guo contribute equally to this work.

Abstract

Deep learning based semi-supervised learning (SSL) algorithms have led to promising results in recent years. However, they tend to introduce multiple tunable hyper-parameters, making them less practical in real SSL scenarios where the labeled data is scarce for extensive hyper-parameter search. In this paper, we propose a novel meta-learning based SSL algorithm (Meta-Semi) that requires tuning only one additional hyper-parameter, compared with a standard supervised deep learning algorithm, to achieve competitive performance under various conditions of SSL. We start by defining a meta optimization problem that minimizes the loss on labeled data through dynamically reweighting the loss on unlabeled samples, which are associated with soft pseudo labels during training. As the meta problem is computationally intensive to solve directly, we propose an efficient algorithm to dynamically obtain the approximate solutions. We show theoretically that Meta-Semi converges to the stationary point of the loss function on labeled data under mild conditions. Empirically, Meta-Semi outperforms state-of-the-art SSL algorithms significantly on the challenging semi-supervised CIFAR-100 and STL-10 tasks, and achieves competitive performance on CIFAR-10 and SVHN.

Keywords: computer vision, deep learning, semi-supervised learning

References(71)

[1]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, in Proc. 25th Int. Conf. Neural Information Processing Systems, Lake Tahoe, NE, USA, 2012, pp. 1097–1105.
[2]
K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, in Proc. 3rd Int. Conf. Learning Representations, San Diego, CA, USA, 2015.
[3]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, Going deeper with convolutions, in 2015 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp. 1–9.
DOI
[4]

Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol. 521, no. 7553, pp. 436–444, 2015.

[5]
K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770–778.
DOI
[6]

G. Huang, Z. Liu, G. Pleiss, L. Van Der Maaten, and K. Q. Weinberger, Convolutional networks with dense connectivity, IEEE Trans. Patt. Anal. Mach. Intell., vol. 44, no. 12, pp. 8704–8716, 2019.

[7]

M. Wang, H. Li, X. Chen, and Y. Chen, Deep learning-based model reduction for distributed parameter systems, IEEE Trans. Syst. Man Cybernet. Syst., vol. 46, no,12, pp. 1664–1674, 2016.

[8]

A. I. Károly, P. Galambos, J. Kuti, and I. J. Rudas, Deep learning in robotics: Survey on model structures and training strategies, IEEE Trans. Syst. Man Cybernet. Syst., vol. 51, no. 1, pp. 266–279, 2021.

[9]
A. Oliver, A. Odena, C. Raffel, E. D. Cubuk, and I. J. Goodfellow, Realistic evaluation of deep semi-supervised learning algorithms, in Proc. 32nd Int. Conf. Neural Information Processing Systems, Montréal, Canada, 2018, pp. 3239–3250.
[10]
V. Verma, A. Lamb, J. Kannala, Y. Bengio, and D. Lopez-Paz, Interpolation consistency training for semi-supervised learning, in Proc. 28th Int. Joint Conf. Artificial Intelligence, Macao, China, 2019, pp. 3635–3641.
DOI
[11]
X. Zhu, Z. Ghahramani, and J. Lafferty, Semi-supervised learning using Gaussian fields and harmonic functions, in Proc. Twentieth Int. Conf. Int. Conf. Machine Learning, Washington, DC, USA, 2003, pp. 912–919.
[12]
O. Chapelle, B. Schölkopf, and A. Zien, Semi-Supervised Learning (Adaptive Computation and Machine Learning), Cambridge, MA, USA: MIT Press, 2006.
DOI
[13]
J. Turian, L. A. Ratinov, and Y. Bengio, Word representations: A simple and general method for semi-supervised learning, in Proc. 48th Ann. Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 2010, pp. 384–394.
[14]
S. Laine and T. Aila, Temporal ensembling for semi-supervised learning, arXiv preprint arXiv: 1610.02242, 2016.
[15]
A. Tarvainen and H. Valpola, Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results, in Proc. 31st Int. Conf. Neural Information Processing Systems, Long Beach, CA USA, 2017, pp. 1195–1204.
[16]

T. Miyato, S. I. Maeda, M. Koyama, and S. Ishii, Virtual adversarial training: a regularization method for supervised and semi-supervised learning, IEEE Trans. Patt. Anal. Mach. Intell., vol. 41, no. 8, pp. 1979–1993, 2019.

[17]
D. Berthelot, N. Carlini, I. Goodfellow, A. Oliver, N. Papernot, and C. Raffel, MixMatch: A holistic approach to semi-supervised learning, in Proc. 33rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, p. 454.
[18]

S. Y. Shin, S. Lee, I. D. Yun, S. M. Kim, and K. M. Lee, Joint weakly and semi-supervised deep learning for localization and classification of masses in breast ultrasound images, IEEE Trans. Med. Imag., vol. 38, no. 3, pp. 762–774, 2019.

[19]

Q. Liu, L. Yu, L. Luo, Q. Dou, and P. A. Heng, Semi-supervised medical image classification with relationd-riven self-ensembling model, IEEE Trans. Med. Imag., vol. 39, no. 11, pp. 3429–3440, 2020.

[20]

L. Yang, S. Yang, P. Jin, and R. Zhang, Semi-supervised hyperspectral image classification using spatio-spectral Laplacian support vector machine, IEEE Geosc. Remote Sens. Lett., vol. 11, no. 3, pp. 651–655, 2014.

[21]

Y. Wu, G. Mu, C. Qin, Q. Miao, W. Ma, and X. Zhang, Semi-supervised hyperspectral image classification via spatial-regulated self-training, Remote Sensing, vol. 12, no. 1, p. 159, 2020.

[22]

J. Erman, A. Mahanti, M. Arlitt, I. Cohen, and C. Williamson, Offline/realtime traffic classification using semi-supervised learning, Performance Evaluation, vol. 64, no. 9-12, pp. 1194–1213, 2007.

[23]
T. Clanuwat, M. Bober-Irizar, A. Kitamoto, A. Lamb, K. Yamamoto, and D. Ha, Deep learning for classical Japanese literature, arXiv preprint arXiv: 1812.01718, 2018.
[24]

J. Bergstra and Y. Bengio, Random search for hyper-parameter optimization, J Mach. Learn. Res., vol. 13, pp. 281–305, 2012.

[25]
M. Sajjadi, M. Javanmardi, and T. Tasdizen, Regularization with stochastic transformations and perturbations for deep semi-supervised learning, in Proc. 30th Int. Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 1171–1179.
[26]
P. Bachman, O. Alsharif, and D. Precup, Learning with pseudo-ensembles, in Proc. 27th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 3365–3373.
[27]
S. Park, J. Park, S. J. Shin, and I. C. Moon, Adversarial dropout for supervised and semi-supervised learning, in Proc. 32nd AAAI Conf. Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conf. and Eighth AAAI Symp. Educational Advances in Artificial Intelligence, New Orleans, LA, USA, 2018, pp. 480.
DOI
[28]
D. Berthelot, N. Carlini, E. D. Cubuk, A. Kurakin, K. Sohn, H. Zhang, and C. Raffel, ReMixMatch: Semi-supervised learning with distribution alignment and augmentation anchoring, arXiv preprint arXiv: 1911.09785, 2020.
[29]
T. Joachims, Transductive learning via spectral graph partitioning, in Proc. Twentieth Int. Conf. Int. Conf. Machine Learning, Washington, DC, USA, 2003, pp. 290–297.
[30]
T. Joachims, Transductive inference for text classification using support vector machines, in Proc. 16th Int. Conf. Machine Learning, San Francisco, CA, USA, 1999, pp. 200–209.
[31]
B. Yoshua, D. Olivier, and R. N. Le, Label propagation and quadratic criterion, in Semi-Supervised Learning, O. Chapelle, B. Scholkopf, A. Zien, eds. Cambridge, MA, USA: MIT Press, 2006, pp. 192–216.
DOI
[32]
D. P. Kingma, D. J. Rezende, S. Mohamed, and M. Welling, Semi-supervised learning with deep generative models, in Proc. 27th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 3581–3589.
[33]
A. Odena, Semi-supervised learning with generative adversarial networks, arXiv preprint arXiv: 1606.01583, 2016.
[34]
D. H Lee, Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks, in Proc. 30th Int. Conf. Machine Learning, Atlanta, GA, USA, 2013, p. 2.
[35]
Y. Grandvalet and Y. Bengio, Semi-supervised learning by entropy minimization, in Proc. 17th Int. Conf. Neural Information Processing Systems, Vancouver, British, 2005, pp. 529–536.
[36]

G. He, Y. Pan, X. Xia, J. He, R. Peng, and N. N. Xiong, A fast semi-supervised clustering framework for large-scale time series data, IEEE Trans. Syst. Man Cybernet. Syst., vol. 51, no. 7, pp. 4201–4206, 2019.

[37]

G. He, B. Li, H. Wang, and W. Jiang, Cost-effective active semi-supervised learning on multivariate time series data with crowds, IEEE Trans. Syst. Man Cybernet. Syst., vol. 52, no. 3, pp. 1437–1450, 2020.

[38]

G. Wang, K. W. Wong, and J. Lu, AUC-based extreme learning machines for supervised and semi-supervised imbalanced classification, IEEE Trans. Syst. Man Cybernet. Syst., vol. 51, no. 12, pp. 7919–7930, 2020.

[39]

J. Zhao, L. Chen, W. Pedrycz, and W. Wang, A novel semi-supervised sparse Bayesian regression based on variational inference for industrial datasets with incomplete outputs, IEEE Trans. Syst. Man. Cybernet. Syst., vol. 50, no. 11, pp. 4773–4786, 2020.

[40]

B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman, Building machines that learn and think like people, Behav. Brain Sci., vol. 40, p. e253, 2017.

[41]
M. Andrychowicz, M. Denil, S. G. Colmenarejo, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, and N. De Freitas, Learning to learn by gradient descent by gradient descent, in Proc. 30th Int. Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 3981–3989.
[42]
S. Ravi and H. Larochelle, Optimization as a model for few-shot learning, in 5th Int. Conf. Learning Representations, Toulon, France, 2017, pp. 1–11.
[43]
M. Ren, E. Triantafillou, S. Ravi, J. Snell, K. Swersky, J. B. Tenenbaum, H. Larochelle, and R. S. Zemel, Meta-learning for semi-supervised few-shot classification, in Proc. 6th Int. Conf. Learning Representations, Vancouver, Canada, 2018.
[44]
C. Finn, P. Abbeel, and S. Levine, Model-agnostic meta-learning for fast adaptation of deep networks, in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 1126–1135.
[45]
M. Ren, W. Zeng, B. Yang, and R. Urtasun, Learning to reweight examples for robust deep learning, in Proc. 35th Int. Conf. Machine Learning, Stockholmsmässan, Sweden, 2018, pp. 4334–4343.
[46]

F. F. Li, R. Fergus, and P, Perona, One-shot learning of object categories, IEEE Trans. Patt. Anal. Mach. Intell., vol. 28, no. 4, pp. 594–611, 2006.

[47]
Y. Liu, J. Lee, M. Park, S. Kim, E. Yang, S. Hwang, and Y. Yang, Learning to propagate labels: Transductive propagation network for few-shot learning, in Proc. 7th Int. Conf. Learning Representations, New Orleans, LA, USA, 2019.
[48]
X. Li, Q. Sun, Y. Liu, S. Zheng, Q. Zhou, T. S. Chua, and B. Schiele, Learning to self-train for semi-supervised few-shot classification, in Proc. 33rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, p. 922.
[49]
Z. Yu, L. Chen, Z. Cheng, and J. Luo, TransMatch: A transfer-learning scheme for semi-supervised few-shot learning, in 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 12853–12861.
DOI
[50]
P. Rodríguez, I. Laradji, A. Drouin, and A. Lacoste, Embedding propagation: Smoother manifold for few-shot classification, in 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 121–138.
DOI
[51]
A. A. Rusu, D. Rao, J. Sygnowski, O. Vinyals, R. Pascanu, S. Osindero, and R. Hadsell, Meta-learning with latent embedding optimization, in Proc. 7th Int. Conf. Learning Representations, New Orleans, LA, USA, 2019.
[52]
Q. Sun, Y. Liu, T. S. Chua, and B. Schiele, Meta-transfer learning for few-shot learning, in 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, 2019, pp. 403–412.
DOI
[53]
H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz. Mixup: Beyond empirical risk minimization. in Proc. 6th Int. Conf. Learning Representations, Vancouver, Canada, 2018.
[54]
S. J. Reddi, A. Hefny, S. Sra, B. Póczós, and A. Smola, Stochastic variance reduction for nonconvex optimization, in Proc. 33rd Int. Conf. Int. Conf. Machine Learning, New York, NY, USA, 2016, pp. 314–323.
[55]
A. Krizhevsky, Learning Multiple Layers of Features from Tiny Images,https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf, 2009.
[56]
Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, Reading digits in natural images with unsupervised feature learning, presented on 25th Conf. Neural Information Processing Systems Workshop on Deep Learning and Unsupervised Feature Learning, Granada, Spain, 2011.
[57]
A. Coates, A. Ng, and H. Lee, An analysis of single-layer networks in unsupervised feature learning, in Proc. Fourteenth Int. Conf. Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 2011, pp. 215–223.
[58]
B. Athiwaratkun, M. Finzi, P. Izmailov, and A. G. Wilson, There are many consistent explanations of unlabeled data: Why you should average, in Proc. 7th Int. Conf. Learning Representations, New Orleans, LA, USA, 2019.
[59]
T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, A simple framework for contrastive learning of visual representations, in Proc. 37th Int. Conf. Machine Learning, virtual, 2020, p. 149.
[60]
K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, Momentum contrast for unsupervised visual representation learning, in 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 9726–9735.
DOI
[61]
X. Chen and K. He, Exploring simple Siamese representation learning, in 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 15745–15753.
DOI
[62]
Y. Luo, J. Zhu, M. Li, Y. Ren, and B. Zhang, Smooth neighbors on teacher graphs for semi-supervised learning, in 2018 IEEE Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 8896–8905.
DOI
[63]
I. Loshchilov and F. Hutter, SGDR: Stochastic gradient descent with warm restarts, in Proc. 5th Int. Conf. Learning Representations, Toulon, France, 2016.
[64]
G. Huang, Y. Li, G. Pleiss, Z. Liu, J. E. Hopcroft, and K. Q. Weinberger, Snapshot ensembles: Train 1, get m for free, in Proc. 5th Int. Conf. Learning Representations, Toulon, France, 2017.
[65]
J. Zhao, M. Mathieu, R. Goroshin, and Y. LeCun, Stacked what-where auto-encoders, arXiv preprint arXiv: 1506.02351, 2015.
[66]
E. Denton, S. Gross, and R. Fergus, Semi-supervised learning with context-conditional generative adversarial networks, arXiv preprint arXiv: 1611.06430, 2016.
[67]
K. Lee, S. Maji, A. Ravichandran, and S. Soatto, Meta-learning with differentiable convex optimization, in 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 10649–10657.
DOI
[68]
Z. Wu, Y. Xiong, S. X. Yu, and D. Lin, Unsupervised feature learning via non-parametric instance discrimination, in 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 3733–3742.
DOI
[69]
W. Huang, M. Yi, and X. Zhao, Towards the generalization of contrastive self-supervised learning, arXiv preprint arXiv: 2111.00743, 2021.
[70]
J. Li, C. Xiong, and S. C. H. Hoi, CoMatch: Semi-supervised learning with contrastive graph regularization, in 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2021, pp. 9455–9464.
DOI
[71]
D. P. Bertsekas, Nonlinear Programming 2nd ed., Belmont, WY, USA: Athena Scientific, 1999.
File
AI-2022-0014_ESM.pdf (353.3 KB)
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 07 December 2022
Revised: 08 January 2023
Accepted: 17 January 2023
Published: 10 March 2023
Issue date: December 2022

Copyright

© The author(s) 2022

Acknowledgements

Acknowledgment

This work was supported by the National Key R&D Program of China (No. 2019YFC1408703), the National Natural Science Foundation of China (No. 62022048), THU-Bosch JCML, and Beijing Academy of Artificial Intelligence. In particular, we appreciate the valuable discussion with Yitong Xia and Hong Zhang.

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return