Meta-Semi: A Meta-Learning Approach for Semi-Supervised Learning

Yulin Wang; Jiayi Guo; Jiangshan Wang; Cheng Wu; Shiji Song; Gao Huang

doi:10.26599/AIR.2022.9150011

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (754.7 KB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Original Research | Open Access

Meta-Semi: A Meta-Learning Approach for Semi-Supervised Learning

Yulin Wang^¹, Jiayi Guo^¹, Jiangshan Wang^², Cheng Wu^¹, Shiji Song^¹, Gao Huang^¹(

)

1Department of Automation, BNRist, Tsinghua University, Beijing 100084, China

2Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518131, China

Yulin Wang and Jiayi Guo contribute equally to this work.

Show Author Information

An erratum to this article is available online at:

https://doi.org/10.26599/AIR.2023.9150017

Abstract

Deep learning based semi-supervised learning (SSL) algorithms have led to promising results in recent years. However, they tend to introduce multiple tunable hyper-parameters, making them less practical in real SSL scenarios where the labeled data is scarce for extensive hyper-parameter search. In this paper, we propose a novel meta-learning based SSL algorithm (Meta-Semi) that requires tuning only one additional hyper-parameter, compared with a standard supervised deep learning algorithm, to achieve competitive performance under various conditions of SSL. We start by defining a meta optimization problem that minimizes the loss on labeled data through dynamically reweighting the loss on unlabeled samples, which are associated with soft pseudo labels during training. As the meta problem is computationally intensive to solve directly, we propose an efficient algorithm to dynamically obtain the approximate solutions. We show theoretically that Meta-Semi converges to the stationary point of the loss function on labeled data under mild conditions. Empirically, Meta-Semi outperforms state-of-the-art SSL algorithms significantly on the challenging semi-supervised CIFAR-100 and STL-10 tasks, and achieves competitive performance on CIFAR-10 and SVHN.

Keywords

computer vision deep learning semi-supervised learning

Electronic Supplementary Material

Download File(s)

AI-2022-0014_ESM.pdf (353.3 KB)

References

[1]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, in Proc. 25^th Int. Conf. Neural Information Processing Systems, Lake Tahoe, NE, USA, 2012, pp. 1097–1105.

[2]

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, in Proc. 3^rd Int. Conf. Learning Representations, San Diego, CA, USA, 2015.

[3]

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, Going deeper with convolutions, in 2015 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp. 1–9.

Crossref

[4]

Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol. 521, no. 7553, pp. 436–444, 2015.

Crossref Google Scholar

[5]

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770–778.

Crossref

[6]

G. Huang, Z. Liu, G. Pleiss, L. Van Der Maaten, and K. Q. Weinberger, Convolutional networks with dense connectivity, IEEE Trans. Patt. Anal. Mach. Intell., vol. 44, no. 12, pp. 8704–8716, 2019.

Crossref Google Scholar

[7]

M. Wang, H. Li, X. Chen, and Y. Chen, Deep learning-based model reduction for distributed parameter systems, IEEE Trans. Syst. Man Cybernet. Syst., vol. 46, no,12, pp. 1664–1674, 2016.

Crossref Google Scholar

[8]

A. I. Károly, P. Galambos, J. Kuti, and I. J. Rudas, Deep learning in robotics: Survey on model structures and training strategies, IEEE Trans. Syst. Man Cybernet. Syst., vol. 51, no. 1, pp. 266–279, 2021.

Crossref Google Scholar

[9]

A. Oliver, A. Odena, C. Raffel, E. D. Cubuk, and I. J. Goodfellow, Realistic evaluation of deep semi-supervised learning algorithms, in Proc. 32^nd Int. Conf. Neural Information Processing Systems, Montréal, Canada, 2018, pp. 3239–3250.

[10]

V. Verma, A. Lamb, J. Kannala, Y. Bengio, and D. Lopez-Paz, Interpolation consistency training for semi-supervised learning, in Proc. 28^th Int. Joint Conf. Artificial Intelligence, Macao, China, 2019, pp. 3635–3641.

Crossref

[11]

X. Zhu, Z. Ghahramani, and J. Lafferty, Semi-supervised learning using Gaussian fields and harmonic functions, in Proc. Twentieth Int. Conf. Int. Conf. Machine Learning, Washington, DC, USA, 2003, pp. 912–919.

[12]

O. Chapelle, B. Schölkopf, and A. Zien, Semi-Supervised Learning (Adaptive Computation and Machine Learning), Cambridge, MA, USA: MIT Press, 2006.

Crossref

[13]

J. Turian, L. A. Ratinov, and Y. Bengio, Word representations: A simple and general method for semi-supervised learning, in Proc. 48^th Ann. Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 2010, pp. 384–394.

[14]

S. Laine and T. Aila, Temporal ensembling for semi-supervised learning, arXiv preprint arXiv: 1610.02242, 2016.

[15]

A. Tarvainen and H. Valpola, Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results, in Proc. 31^st Int. Conf. Neural Information Processing Systems, Long Beach, CA USA, 2017, pp. 1195–1204.

[16]

T. Miyato, S. I. Maeda, M. Koyama, and S. Ishii, Virtual adversarial training: a regularization method for supervised and semi-supervised learning, IEEE Trans. Patt. Anal. Mach. Intell., vol. 41, no. 8, pp. 1979–1993, 2019.

Crossref Google Scholar

[17]

D. Berthelot, N. Carlini, I. Goodfellow, A. Oliver, N. Papernot, and C. Raffel, MixMatch: A holistic approach to semi-supervised learning, in Proc. 33^rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, p. 454.

[18]

S. Y. Shin, S. Lee, I. D. Yun, S. M. Kim, and K. M. Lee, Joint weakly and semi-supervised deep learning for localization and classification of masses in breast ultrasound images, IEEE Trans. Med. Imag., vol. 38, no. 3, pp. 762–774, 2019.

Crossref Google Scholar

[19]

Q. Liu, L. Yu, L. Luo, Q. Dou, and P. A. Heng, Semi-supervised medical image classification with relationd-riven self-ensembling model, IEEE Trans. Med. Imag., vol. 39, no. 11, pp. 3429–3440, 2020.

Crossref Google Scholar

[20]

L. Yang, S. Yang, P. Jin, and R. Zhang, Semi-supervised hyperspectral image classification using spatio-spectral Laplacian support vector machine, IEEE Geosc. Remote Sens. Lett., vol. 11, no. 3, pp. 651–655, 2014.

Crossref Google Scholar

[21]

Y. Wu, G. Mu, C. Qin, Q. Miao, W. Ma, and X. Zhang, Semi-supervised hyperspectral image classification via spatial-regulated self-training, Remote Sensing, vol. 12, no. 1, p. 159, 2020.

Crossref Google Scholar

[22]

J. Erman, A. Mahanti, M. Arlitt, I. Cohen, and C. Williamson, Offline/realtime traffic classification using semi-supervised learning, Performance Evaluation, vol. 64, no. 9-12, pp. 1194–1213, 2007.

Crossref Google Scholar

[23]

T. Clanuwat, M. Bober-Irizar, A. Kitamoto, A. Lamb, K. Yamamoto, and D. Ha, Deep learning for classical Japanese literature, arXiv preprint arXiv: 1812.01718, 2018.

[24]

J. Bergstra and Y. Bengio, Random search for hyper-parameter optimization, J Mach. Learn. Res., vol. 13, pp. 281–305, 2012.

Google Scholar

[25]

M. Sajjadi, M. Javanmardi, and T. Tasdizen, Regularization with stochastic transformations and perturbations for deep semi-supervised learning, in Proc. 30^th Int. Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 1171–1179.

[26]

P. Bachman, O. Alsharif, and D. Precup, Learning with pseudo-ensembles, in Proc. 27^th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 3365–3373.

[27]

S. Park, J. Park, S. J. Shin, and I. C. Moon, Adversarial dropout for supervised and semi-supervised learning, in Proc. 32^nd AAAI Conf. Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conf. and Eighth AAAI Symp. Educational Advances in Artificial Intelligence, New Orleans, LA, USA, 2018, pp. 480.

Crossref

[28]

D. Berthelot, N. Carlini, E. D. Cubuk, A. Kurakin, K. Sohn, H. Zhang, and C. Raffel, ReMixMatch: Semi-supervised learning with distribution alignment and augmentation anchoring, arXiv preprint arXiv: 1911.09785, 2020.

[29]

T. Joachims, Transductive learning via spectral graph partitioning, in Proc. Twentieth Int. Conf. Int. Conf. Machine Learning, Washington, DC, USA, 2003, pp. 290–297.

[30]

T. Joachims, Transductive inference for text classification using support vector machines, in Proc. 16^th Int. Conf. Machine Learning, San Francisco, CA, USA, 1999, pp. 200–209.

[31]

B. Yoshua, D. Olivier, and R. N. Le, Label propagation and quadratic criterion, in Semi-Supervised Learning, O. Chapelle, B. Scholkopf, A. Zien, eds. Cambridge, MA, USA: MIT Press, 2006, pp. 192–216.

Crossref

[32]

D. P. Kingma, D. J. Rezende, S. Mohamed, and M. Welling, Semi-supervised learning with deep generative models, in Proc. 27^th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 3581–3589.

[33]

A. Odena, Semi-supervised learning with generative adversarial networks, arXiv preprint arXiv: 1606.01583, 2016.

[34]

D. H Lee, Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks, in Proc. 30^th Int. Conf. Machine Learning, Atlanta, GA, USA, 2013, p. 2.

[35]

Y. Grandvalet and Y. Bengio, Semi-supervised learning by entropy minimization, in Proc. 17^th Int. Conf. Neural Information Processing Systems, Vancouver, British, 2005, pp. 529–536.

[36]

G. He, Y. Pan, X. Xia, J. He, R. Peng, and N. N. Xiong, A fast semi-supervised clustering framework for large-scale time series data, IEEE Trans. Syst. Man Cybernet. Syst., vol. 51, no. 7, pp. 4201–4206, 2019.

Crossref Google Scholar

[37]

G. He, B. Li, H. Wang, and W. Jiang, Cost-effective active semi-supervised learning on multivariate time series data with crowds, IEEE Trans. Syst. Man Cybernet. Syst., vol. 52, no. 3, pp. 1437–1450, 2020.

Crossref Google Scholar

[38]

G. Wang, K. W. Wong, and J. Lu, AUC-based extreme learning machines for supervised and semi-supervised imbalanced classification, IEEE Trans. Syst. Man Cybernet. Syst., vol. 51, no. 12, pp. 7919–7930, 2020.

Crossref Google Scholar

[39]

J. Zhao, L. Chen, W. Pedrycz, and W. Wang, A novel semi-supervised sparse Bayesian regression based on variational inference for industrial datasets with incomplete outputs, IEEE Trans. Syst. Man. Cybernet. Syst., vol. 50, no. 11, pp. 4773–4786, 2020.

Crossref Google Scholar

[40]

B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman, Building machines that learn and think like people, Behav. Brain Sci., vol. 40, p. e253, 2017.

Crossref Google Scholar

[41]

M. Andrychowicz, M. Denil, S. G. Colmenarejo, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, and N. De Freitas, Learning to learn by gradient descent by gradient descent, in Proc. 30^th Int. Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 3981–3989.

[42]

S. Ravi and H. Larochelle, Optimization as a model for few-shot learning, in 5^th Int. Conf. Learning Representations, Toulon, France, 2017, pp. 1–11.

[43]

M. Ren, E. Triantafillou, S. Ravi, J. Snell, K. Swersky, J. B. Tenenbaum, H. Larochelle, and R. S. Zemel, Meta-learning for semi-supervised few-shot classification, in Proc. 6^th Int. Conf. Learning Representations, Vancouver, Canada, 2018.

[44]

C. Finn, P. Abbeel, and S. Levine, Model-agnostic meta-learning for fast adaptation of deep networks, in Proc. 34^th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 1126–1135.

[45]

M. Ren, W. Zeng, B. Yang, and R. Urtasun, Learning to reweight examples for robust deep learning, in Proc. 35^th Int. Conf. Machine Learning, Stockholmsmässan, Sweden, 2018, pp. 4334–4343.

[46]

F. F. Li, R. Fergus, and P, Perona, One-shot learning of object categories, IEEE Trans. Patt. Anal. Mach. Intell., vol. 28, no. 4, pp. 594–611, 2006.

Crossref Google Scholar

[47]

Y. Liu, J. Lee, M. Park, S. Kim, E. Yang, S. Hwang, and Y. Yang, Learning to propagate labels: Transductive propagation network for few-shot learning, in Proc. 7^th Int. Conf. Learning Representations, New Orleans, LA, USA, 2019.

[48]

X. Li, Q. Sun, Y. Liu, S. Zheng, Q. Zhou, T. S. Chua, and B. Schiele, Learning to self-train for semi-supervised few-shot classification, in Proc. 33^rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, p. 922.

[49]

Z. Yu, L. Chen, Z. Cheng, and J. Luo, TransMatch: A transfer-learning scheme for semi-supervised few-shot learning, in 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 12853–12861.

Crossref

[50]

P. Rodríguez, I. Laradji, A. Drouin, and A. Lacoste, Embedding propagation: Smoother manifold for few-shot classification, in 16^th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 121–138.

Crossref

[51]

A. A. Rusu, D. Rao, J. Sygnowski, O. Vinyals, R. Pascanu, S. Osindero, and R. Hadsell, Meta-learning with latent embedding optimization, in Proc. 7^th Int. Conf. Learning Representations, New Orleans, LA, USA, 2019.

[52]

Q. Sun, Y. Liu, T. S. Chua, and B. Schiele, Meta-transfer learning for few-shot learning, in 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, 2019, pp. 403–412.

Crossref

[53]

H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz. Mixup: Beyond empirical risk minimization. in Proc. 6^th Int. Conf. Learning Representations, Vancouver, Canada, 2018.

[54]

S. J. Reddi, A. Hefny, S. Sra, B. Póczós, and A. Smola, Stochastic variance reduction for nonconvex optimization, in Proc. 33^rd Int. Conf. Int. Conf. Machine Learning, New York, NY, USA, 2016, pp. 314–323.

[55]

A. Krizhevsky, Learning Multiple Layers of Features from Tiny Images,https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf, 2009.

[56]

Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, Reading digits in natural images with unsupervised feature learning, presented on 25^th Conf. Neural Information Processing Systems Workshop on Deep Learning and Unsupervised Feature Learning, Granada, Spain, 2011.

[57]

A. Coates, A. Ng, and H. Lee, An analysis of single-layer networks in unsupervised feature learning, in Proc. Fourteenth Int. Conf. Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 2011, pp. 215–223.

[58]

B. Athiwaratkun, M. Finzi, P. Izmailov, and A. G. Wilson, There are many consistent explanations of unlabeled data: Why you should average, in Proc. 7^th Int. Conf. Learning Representations, New Orleans, LA, USA, 2019.

[59]

T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, A simple framework for contrastive learning of visual representations, in Proc. 37^th Int. Conf. Machine Learning, virtual, 2020, p. 149.

[60]

K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, Momentum contrast for unsupervised visual representation learning, in 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 9726–9735.

Crossref

[61]

X. Chen and K. He, Exploring simple Siamese representation learning, in 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 15745–15753.

Crossref

[62]

Y. Luo, J. Zhu, M. Li, Y. Ren, and B. Zhang, Smooth neighbors on teacher graphs for semi-supervised learning, in 2018 IEEE Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 8896–8905.

Crossref

[63]

I. Loshchilov and F. Hutter, SGDR: Stochastic gradient descent with warm restarts, in Proc. 5^th Int. Conf. Learning Representations, Toulon, France, 2016.

[64]

G. Huang, Y. Li, G. Pleiss, Z. Liu, J. E. Hopcroft, and K. Q. Weinberger, Snapshot ensembles: Train 1, get m for free, in Proc. 5^th Int. Conf. Learning Representations, Toulon, France, 2017.

[65]

J. Zhao, M. Mathieu, R. Goroshin, and Y. LeCun, Stacked what-where auto-encoders, arXiv preprint arXiv: 1506.02351, 2015.

[66]

E. Denton, S. Gross, and R. Fergus, Semi-supervised learning with context-conditional generative adversarial networks, arXiv preprint arXiv: 1611.06430, 2016.

[67]

K. Lee, S. Maji, A. Ravichandran, and S. Soatto, Meta-learning with differentiable convex optimization, in 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 10649–10657.

Crossref

[68]

Z. Wu, Y. Xiong, S. X. Yu, and D. Lin, Unsupervised feature learning via non-parametric instance discrimination, in 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 3733–3742.

Crossref

[69]

W. Huang, M. Yi, and X. Zhao, Towards the generalization of contrastive self-supervised learning, arXiv preprint arXiv: 2111.00743, 2021.

[70]

J. Li, C. Xiong, and S. C. H. Hoi, CoMatch: Semi-supervised learning with contrastive graph regularization, in 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2021, pp. 9455–9464.

Crossref

[71]

D. P. Bertsekas, Nonlinear Programming 2nd ed., Belmont, WY, USA: Athena Scientific, 1999.

CAAI Artificial Intelligence Research

Volume 1 Issue 2,
December 2022

Pages 161-171

DOI: 10.26599/AIR.2022.9150011

Cite this article:

Wang Y, Guo J, Wang J, et al. Meta-Semi: A Meta-Learning Approach for Semi-Supervised Learning. CAAI Artificial Intelligence Research, 2022, 1(2): 161-171. https://doi.org/10.26599/AIR.2022.9150011

3708

Views

1043

Downloads

Crossref

Google Scholar
Citation

Altmetrics

Received: 07 December 2022

Revised: 08 January 2023

Accepted: 17 January 2023

Published: 10 March 2023

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).