Journal Home > Volume 9 , Issue 2

Regional facial image synthesis conditioned on a semantic mask has achieved great attention in the field of computational visual media. However, the appearances of different regions may be inconsistent with each other after performing regional editing. In this paper, we focus on harmonized regional style transfer for facial images. A multi-scale encoder is proposed for accurate style code extraction. The key part of our work is a multi-region style attention module. It adapts multiple regional style embeddings from a reference image to a target image, to generate a harmonious result. We also propose style mapping networks for multi-modal style synthesis. We further employ an invertible flow model which can serve as mapping network to fine-tune the style code by inverting the code to latent space. Experiments on three widely used face datasets were used to evaluate our model by transferring regional facial appearance between datasets. The results show that our model can reliably perform style transfer and multi-modal manipulation, generating output comparable to the state of the art.


menu
Abstract
Full text
Outline
Electronic supplementary material
About this article

Towards harmonized regional style transfer and manipulation for facial images

Show Author's information Cong Wang1Fan Tang2( )Yong Zhang3Tieru Wu1,2( )Weiming Dong4
School of Mathematics, Jilin University, Changchun 130012, China
School of Artificial Intelligence, Jilin University, Changchun 130012, China
AI Lab, Tencent, Shenzhen 518054, China
Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

Abstract

Regional facial image synthesis conditioned on a semantic mask has achieved great attention in the field of computational visual media. However, the appearances of different regions may be inconsistent with each other after performing regional editing. In this paper, we focus on harmonized regional style transfer for facial images. A multi-scale encoder is proposed for accurate style code extraction. The key part of our work is a multi-region style attention module. It adapts multiple regional style embeddings from a reference image to a target image, to generate a harmonious result. We also propose style mapping networks for multi-modal style synthesis. We further employ an invertible flow model which can serve as mapping network to fine-tune the style code by inverting the code to latent space. Experiments on three widely used face datasets were used to evaluate our model by transferring regional facial appearance between datasets. The results show that our model can reliably perform style transfer and multi-modal manipulation, generating output comparable to the state of the art.

Keywords: style transfer, face manipulation, generative models, facial harmonization

References(56)

[1]
Isola, P.; Zhu, J. -Y.; Zhou, T.; Efros, A. A. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5967–5976, 2017.
DOI
[2]
Zhu, J.-Y.; Zhang, R.; Pathak, D.; Darrell, T.; Efros, A. A.; Wang, O.; Shechtman, E. Toward multimodal image-to-image translation. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 465–476, 2017.
[3]
Chen, Q.; Koltun, V. Photographic image synthesis with cascaded refinement networks. In: Proceedings of the IEEE International Conference on Computer Vision, 1520–1529, 2017.
DOI
[4]
Wang, T.-C.; Liu, M.-Y.; Zhu, J.-Y.; Tao, A.; Kautz, J.; Catanzaro, B. High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8798–8807, 2018.
DOI
[5]
Park, T.; Liu, M.-Y.; Wang, T.-C.; Zhu, J.-Y. Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2332–2341, 2019.
DOI
[6]
Zhu, Z.; Xu, Z.; You, A.; Bai, X. Semantically multi-modal image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5466–5475, 2020.
DOI
[7]
Zhu, P.; Abdal, R.; Qin, Y.; Wonka, P. SEAN: Image synthesis with semantic region-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5103–5112, 2020.
DOI
[8]
Yang, D.; Hong, S.; Jang, Y.; Zhao, T.; Lee, H. Diversity sensitive conditional generative adversarial networks. In: Proceedings of the International Conference on Learning Representations, 2019.
[9]
Gu, S.; Bao, J.; Yang, H.; Chen, D.; Wen, F.; Yuan, L. Mask-guided portrait editing with conditional GANs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3431–3440, 2019.
DOI
[10]
Lee, C.-H.; Liu, Z.; Wu, L.; Luo, P. MaskGAN: Towards diverse and interactive facial image manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5549–5558, 2020.
DOI
[11]
Wang, M.; Yang, G.-Y.; Li, R.; Liang, R.-Z.; Zhang, S.-H.; Hall, P. M.; Hu, S.-M. Example-guided style-consistent image synthesis from semantic labeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1495–1504, 2019.
DOI
[12]
Choi, Y.; Uh, Y.; Yoo, J.; Ha, J. W. StarGAN v2: Diverse image synthesis for multiple domains. In:Proceedings of the IEEE/CVF Conference on ComputerVision and Pattern Recognition, 8185–8194, 2020.
DOI
[13]
Karras, T.; Laine, S.; Aila, T. M. A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog-nition, 4396–4405, 2019.
DOI
[14]
Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and improving the image quality of StyleGAN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8107–8116, 2020.
DOI
[15]
Kingma, D. P.; Welling, M. Auto-encoding variational bayes. In: Proceedings of the International Conference on Learning Representations, 2014.
[16]
Cun, X.; Pun, C.-M. Improving the harmony of the composite image by spatial-separated attention module. IEEE Transactions on Image Processing Vol. 29, 4759–4771, 2020.
[17]
Cong, W. Y.; Zhang, J. F.; Niu, L.; Liu, L.; Ling, Z. X.; Li, W. Y.; Zhang, L. DoveNet: Deep image harmonization via domain verification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8391–8400, 2020.
DOI
[18]
Tsai, Y.-H.; Shen, X.; Lin, Z.; Sunkavalli, K.; Lu, X.; Yang, M.-H. Deep image harmonization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2799–2807, 2017.
DOI
[19]
Zhu, J.-Y.; Krahenbuhl, P.; Shechtman, E.; Efros, A. A. Learning a discriminative model for the perception of realism in composite images. In: Proceedings of the IEEE International Conference on Computer Vision, 3943–3951, 2015.
DOI
[20]
Richardson, E.; Alaluf, Y.; Patashnik, O.; Nitzan, Y.; Azar, Y.; Shapiro, S.; Cohen-Or, D. Encoding in style: A StyleGAN encoder for image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2287–2296, 2021.
DOI
[21]
Chen, R. T. Q.; Rubanova, Y.; Bettencourt, J.; Duvenaud, D. Neural ordinary differential equations. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, 6572–6583, 2018.
[22]
Grathwohl, W.; Chen, R. T. Q.; Bettencourt, J.; Sutskever, I.; Duvenaud, D. FFJORD: Free-form continuous dynamics for scalable reversible generative models. In: Proceedings of the International Conference on Learning Representations, 2018.
[23]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, Vol. 3, 2672–2680, 2014.
[24]
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, Vol. 70, 214–223, 2017.
[25]
Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of GANs for improved quality, stability, and variation. In: Proceedings of the International Conference on Learning Representations, 2018.
[26]
Denton, E.; Chintala, S.; Szlam, A.; Fergus, R. Deep generative image models using a Laplacian pyramid of adversarial networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, Vol. 1, 1486–1494, 2015.
[27]
Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A. Improved training of wasserstein GANs. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 5769–5779, 2017.
[28]
Mao, X.; Li, Q.; Xie, H.; Lau, R. Y.; Wang, Z.; Smolley, S. P. Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2813–2821, 2017.
DOI
[29]
Zhang, H.; Goodfellow, I. J.; Metaxas, D. N.; Odena, A. Self-attention generative adversarial networks. In: Proceedings of the 36th International Conference on Machine Learning, 7354–7363, 2019.
[30]
Portenier, T.; Hu, Q.; Szabo, A.; Bigdeli, S. A.; Favaro, P.; Zwicker, M. Faceshop: Deep sketch-based face image editing. ACM Transactions on Graphics Vol. 37, No. 4, Article No. 99, 2018.
[31]
Chen, S.-Y.; Liu, F.-L.; Lai, Y.-K.; Rosin, P. L.; Li, C.; Fu, H.; Gao, L. DeepFaceEditing: Deep face generation and editing with disentangled geometry and appearance control. ACM Transactions on Graphics Vol. 40, No. 4, Article No. 90, 2021.
[32]
Tan, Z.; Chai, M.; Chen, D.; Liao, J.; Chu, Q.; Yuan, L.; Tulyakov, S.; Yu, N. MichiGAN: Multi-input-conditioned hair image generation for portrait editing. ACM Transactions on Graphics Vol. 39, No. 4, Article No. 95, 2020.
[33]
Huang, Z.; Peng, Y.; Hibino, T.; Zhao, C.; Xie, H.; Fukusato, T.; Miyata, K. DualFace: Two-stage drawing guidance for freehand portrait sketching. Computational Visual Media Vol. 8, No. 1, 63–77, 2022.
[34]
Shen, Y. J.; Gu, J. J.; Tang, X. O.; Zhou, B. L. Interpreting the latent space of GANs for semantic face editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9240–9249, 2020.
DOI
[35]
Shen, Y. J.; Zhou, B. L. Closed-form factorization of latent semantics in GANs. arXiv preprint arXiv:2007.06600, 2020.
[36]
Tewari, A.; Elgharib, M.; Bharaj, G.; Bernard, F.; Seidel, H.-P.; Perez, P.; Zollhofer, M.; Theobalt, C. StyleRig: Rigging StyleGAN for 3D control over portrait images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6141–6150, 2020.
DOI
[37]
Abdal, R.; Zhu, P. H.; Mitra, N.; Wonka, P. StyleFlow: Attribute-conditioned exploration of StyleGAN-generated images using conditional continuous normalizing flows. arXiv preprint arXiv:2008.02401, 2020.
[38]
Rezende, D.; Mohamed, S. Variational inference with normalizing flows. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, Vol. 37, 1530–1538, 2015.
[39]
Zhu, J.; Shen, Y.; Zhao, D.; Zhou, B. In-domain GAN inversion for real image editing. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12362. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 592–608, 2020.
[40]
Sun, R. Q.; Huang, C.; Zhu, H. L.; Ma, L. Z. Mask-aware photorealistic facial attribute manipulation. Computational Visual Media Vol. 7, No. 3, 363–374, 2021.
[41]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 6000–6010, 2017.
[42]
Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7794–7803, 2018.
DOI
[43]
Zhang, P.; Zhang, B.; Chen, D.; Yuan, L.; Wen, F. Cross-domain correspondence learning for exemplar-based image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5142–5152, 2020.
DOI
[44]
Lee, J.; Kim, E.; Lee, Y.; Kim, D.; Chang, J.; Choo, J. Reference-based sketch image colorization using augmented-self reference and dense semantic correspondence. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5800–5809, 2020.
DOI
[45]
Jiang, W.; Liu, S.; Gao, C.; Cao, J.; He, R.; Feng, J.; Yan, S. PSGAN: Pose and expression robust spatial-aware GAN for customizable makeup transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5193–5201, 2020.
DOI
[46]
Huang, X.; Liu, M. Y.; Belongie, S.; Kautz, J. Multimodal unsupervised image-to-image translation. In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science, Vol. 11207. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 179–196, 2018.
DOI
[47]
Lee, H.-Y.; Tseng, H.-Y.; Huang, J.-B.; Singh, M. K.; Yang, M.-H. Diverse image-to-image translation via disentangled representations. In: Proceedings of the European Conference on Computer Vision, 2018.
DOI
[48]
Cun, X.; Pun, C.-M. Improving the harmony of the composite image by spatial-separated attention module. IEEE Transactions on Image Processing Vol. 29, 4759–4771, 2020.
[49]
Yang, G. D.; Huang, X.; Hao, Z. K.; Liu, M. Y.; Belongie, S.; Hariharan, B. PointFlow: 3D point cloud generation with continuous normalizing flows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 4540–4549, 2019.
DOI
[50]
Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017.
[51]
Liu, Y.; Shi, H.; Shen, H.; Si, Y.; Wang, X.; Mei, T. A new dataset and boundary-attention semantic segmentation for face parsing. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 34, No. 07, 11637–11644, 2020.
[52]
Tan, Z.; Chen, D.; Chu, Q.; Chai, M.; Liao, J.; He, M.; Yuan, L.; Hua, G.; Yu, N. Efficient semantic image synthesis via class-adaptive normalization. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 44, No. 9, 4852–4866, 2022.
[53]
Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 6629–6640, 2017.
[54]
Zhang, R.; Isola, P.; Efros, A. A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 586–595, 2018.
DOI
[55]
Miyato, T.; Kataoka, T.; Koyama, M.; Yoshida, Y. Spectral normalization for generative adversarial networks. In: Proceedings of the International Conference on Learning Representations, 2018.
[56]
Kingma, D. P.; Ba, J. L. Adam: A method for stochastic optimization. In: Proceedings of the 3rd International Conference for Learning Representations, 2015.
Video
41095_0284_ESM2.mp4
File
41095_0284_ESM1.pdf (18.6 MB)
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 18 February 2022
Accepted: 16 March 2022
Published: 03 January 2023
Issue date: June 2023

Copyright

© The Author(s) 2022.

Acknowledgements

This work was partly supported by the National Key R&D Program of China (No. 2020YFA0714100) and the National Natural Science Foundation of China (Nos. 61872162, 62102162, 61832016, U20B2070).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Return