Unsupervised image translation with distributional semantics awareness

Zhexi Peng; He Wang; Yanlin Weng; Yin Yang; Tianjia Shao

doi:10.1007/s41095-022-0295-3

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (4.9 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Research Article | Open Access

Unsupervised image translation with distributional semantics awareness

Zhexi Peng^¹, He Wang^², Yanlin Weng^¹(

), Yin Yang^³, Tianjia Shao^¹

1State Key Lab of CAD&CG, Zhejiang University, Hangzhou310058, China

2School of Computing, University of Leeds, Leeds, UK

3School of Computing, Clemson University, Clemson, USA

Show Author Information

Graphical Abstract

Abstract

Unsupervised image translation (UIT) studies the mapping between two image domains. Since such mappings are under-constrained, existing research has pursued various desirable properties such as distributional matching or two-way consistency. In this paper, we re-examine UIT from a new perspective: distributional semantics consistency, based on the observation that data variations contain semantics, e.g., shoes varying in colors. Further, the semantics can be multi-dimensional, e.g., shoes also varying in style, functionality, etc. Given two image domains, matching these semantic dimensions during UIT will produce mappings with explicable correspondences, which has not been investigated previously. We propose distributional semantics mapping (DSM), the first UIT method which explicitly matches semantics between two domains. We show that distributional semantics has been rarely considered within and beyond UIT, even though it is a common problem in deep learning. We evaluate DSM on several benchmark datasets, demonstrating its general ability to capture distributional semantics. Extensive comparisons show that DSM not only produces explicable mappings, but also improves image quality in general.

Keywords

unsupervised learning image-to-image translation generative adversarial networks (GANs)manifold alignment distributional semantics

References

[1]

Liu, M. Y.; Breuel, T.; Kautz, J. Unsupervised image-to-image translation networks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 700–708, 2017.

[2]

Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2242–2251, 2017.

Crossref

[3]

Benaim, S.; Wolf, L. One-sided unsupervised domain mapping. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 752–762, 2017.

[4]

Tomei, M.; Cornia, M.; Baraldi, L.; Cucchiara, R. Art2Real: Unfolding the reality of artworks via semantically-aware image-to-image translation. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5842–5852, 2019.

Crossref

[5]

Almahairi, A.; Rajeswar, S.; Sordoni, A.; Bachman, P.; Courville, A. C. Augmented CycleGAN: Learning many-to-many mappings from unpaired data. In: Proceedings of the 35th International Conference on Machine Learning, 195–204, 2018.

[6]

Härkönen, E.; Hertzmann, A.; Lehtinen, J.; Paris, S. GANSpace: Discovering interpretable GAN controls. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, Article No. 825, 9841–9850, 2020.

[7]

Goodfellow, I. J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Communications of the ACM Vol. 63, No. 11, 139–144, 2020.

Crossref Google Scholar

[8]

Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, Vol. 70, 214–223, 2017.

[9]

Karras, T.; Laine, S.; Aila, T. M. A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4396–4405, 2020.

Crossref

[10]

Isola, P.; Zhu, J. Y.; Zhou, T. H.; Efros, A. A. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5967–5976, 2017.

Crossref

[11]

Iizuka, S.; Simo-Serra, E.; Ishikawa, H. Globally and locally consistent image completion. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 107, 2017.

Crossref Google Scholar

[12]

Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A. P.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 105–114, 2017.

Crossref

[13]

Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.

Google Scholar

[14]

Kim, T.; Cha, M.; Kim, H.; Lee, J. K.; Kim, J. Learning to discover cross-domain relations with generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, Vol. 70, 1857–1865, 2017.

[15]

Yi, Z.; Zhang, H.; Tan, P.; Gong, M. DualGAN: Unsupervised dual learning for image-to-image translation. In: Proceedings of the IEEE International Conference on Computer Vision, 2868–2876, 2017.

Crossref

[16]

Chen, S. Y.; Su, W. C.; Gao, L.; Xia, S. H.; Fu, H. B. DeepFaceDrawing: Deep generation of face images from sketches. ACM Transactions on Graphics Vol. 39, No. 4, Article No. 72, 2020.

Crossref Google Scholar

[17]

Chen, S. Y.; Liu, F. L.; Lai, Y. K.; Rosin, P. L.; Li, C. P.; Fu, H. B.; Gao, L. DeepFaceEditing: Deep face generation and editing with disentangled geometry and appearance control. ACM Transactions on Graphics Vol. 40, No. 4, Article No. 90, 2021.

Crossref Google Scholar

[18]

Lee, C. H.; Liu, Z. W.; Wu, L. Y.; Luo, P. MaskGAN: Towards diverse and interactivefacial image manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5548–5557, 2020.

Crossref

[19]

Mao, X. D.; Li, Q.; Xie, H. R.; Lau, R. Y. K.; Wang, Z.; Smolley, S. P. Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2813–2821, 2017.

Crossref

[20]

Wang, T.-C.; Liu, M.-Y.; Zhu, J.-Y.; Tao, A.; Kautz, J.; Catanzaro, B. High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8798–8807, 2018.

Crossref

[21]

Lu, G. S.; Zhou, Z. M.; Song, Y. X.; Ren, K.; Yu, Y. Guiding the one-to-one mapping in CycleGAN via optimal transport. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 33, No. 1, 4432–4439, 2019.

Crossref Google Scholar

[22]

Lu, G.; Zhou, Z.; Shen, J.; Chen, C.; Zhang, W.; Yu, Y. Large-scale optimal transport via adversarial training with cycle-consistency. arXiv preprint arXiv:2003.06635, 2020.

Google Scholar

[23]

Huang, X.; Liu, M. Y.; Belongie, S.; Kautz, J. Multimodal unsupervised image-to-image translation. In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science, Vol. 11207. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 179–196, 2018.

Crossref

[24]

Lee, H. Y.; Tseng, H. Y.; Huang, J. B.; Singh, M.; Yang, M. H. Diverse image-to-image translation via disentangled representations. In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science, Vol. 11205. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 36–52, 2018.

Crossref

[25]

Alami Mejjati, Y.; Richardt, C.; Tompkin, J.; Cosker, D.; Kim, K. I. Unsupervised attention-guided image-to-image translation. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, 3697–3707, 2018.

[26]

Kim, J.; Kim, M.; Kang, H.; Lee, K. U-gat-it: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. arXiv preprint arXiv:1907.10830, 2019.

Google Scholar

[27]

Choi, Y.; Choi, M.; Kim, M.; Ha, J. W.; Kim, S.; Choo, J. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8789–8797, 2018.

Crossref

[28]

Lee, H. Y.; Tseng, H. Y.; Mao, Q.; Huang, J. B.; Lu, Y. D.; Singh, M.; Yang, M. DRIT++: Diverse image-to-image translation via disentangled representations. International Journal of Computer Vision Vol. 128, Nos. 10–11, 2402–2417, 2020.

Crossref Google Scholar

[29]

Choi, Y.; Uh, Y.; Yoo, J.; Ha, J. W. StarGAN v2: Diverse image synthesis for multiple domains. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8185–8194, 2020.

Crossref

[30]

Liu, M. Y.; Huang, X.; Mallya, A.; Karras, T.; Aila, T. M.; Lehtinen, J.; Kautz, J. Few-shot unsupervised image-to-image translation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 10550–10559, 2019.

Crossref

[31]

Fu, H.; Gong, M. M.; Wang, C. H.; Batmanghelich, K.; Zhang, K.; Tao, D. C. Geometry-consistent generative adversarial networks for one-sided unsupervised domain mapping. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2422–2431, 2019.

Crossref

[32]

Kingma, D. P.; Welling, M. Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114, 2013.

Google Scholar

[33]

Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

Google Scholar

[34]

Yu, A.; Grauman, K. Fine-grained visual comparisons with local learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 192–199, 2014.

Crossref

[35]

Zhu, J. Y.; Krähenbühl, P.; Shechtman, E.; Efros, A. A. Generative visual manipulation on the natural image manifold. In: Computer Vision – ECCV 2016. Lecture Notes in Computer Science, Vol. 9909. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 597–613, 2016.

Crossref

[36]

LeCun, Y.; Cortes, C.; Burges, C. J. C. THE MNIST DATABASE of handwritten digits. 1998. Available at http://yann.lecun.com/exdb/mnist/.

[37]

Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 6629–6640, 2017.

[38]

Ulyanov, D.; Vedaldi, A.; Lempitsky, V. S. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016.

Google Scholar

[39]

Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, Vol. 37, 448–456, 2015.

[40]

Miyato, T.; Kataoka, T.; Koyama, M.; Yoshida, Y. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018.

Google Scholar

Computational Visual Media

Volume 9 Issue 3,
September 2023

Pages 619-631

DOI: 10.1007/s41095-022-0295-3

Cite this article:

Peng Z, Wang H, Weng Y, et al. Unsupervised image translation with distributional semantics awareness. Computational Visual Media, 2023, 9(3): 619-631. https://doi.org/10.1007/s41095-022-0295-3

744

Views

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 01 March 2022

Accepted: 15 May 2022

Published: 18 April 2023

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.