AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (43.1 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Research Article | Open Access

Deformable few-shot face cartoonization via local to global translation

College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
Show Author Information

Graphical Abstract

Abstract

Cartoonizing portrait images is a stylish and eye-catching application in both computer vision and graphics. We aimed to train a face cartoonization model using very few (e.g., 5—10) style examples. The main difficulty in this challenging task lies in producing stylizations of high quality while preserving the identity of the input, particularly when the style examples contain strong exaggerations. To address this, we propose a novel cross-domain center loss for few-shot generative adversarial network (GAN) adaptation, which forces the distribution of the target domain to be similar to that of the source. We then employ it to solve this few-shot problem along with a two-stage strategy. Stage Ⅰ generates an intermediate cartoonization for the input, where we first stylize the individual facial components locally and then deform them to mimic the desired exaggeration under the guidance of landmarks. Stage Ⅱ focuses on global refinement of the intermediate image. First, we adapt a pretrained StyleGAN model using the proposed cross-domain center loss to the target domain defined by a few examples. Subsequently, the intermediate cartoonization from Stage Ⅰ can be holistically refined through GAN inversion. The generative power of StyleGAN guarantees high image quality, while the local translation and landmark-guided deformation applied to facial components provide high identity fidelity. Experiments show that the proposed method outperforms state-of-the-art few-shot stylization approaches both qualitatively and quantitatively.

Electronic Supplementary Material

Download File(s)
cvm-11-2-269_ESM.zip (53.4 MB)

References

[1]
Gatys, L. A.; Ecker, A. S.; Bethge, M. Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2414–2423, 2016.
[2]
Johnson, J.; Alahi, A.; Li, F. F. Perceptual losses for real-time style transfer and super-resolution. In: Computer Vision – ECCV 2016. Lecture Notes in Computer Science, Vol. 9906. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 694–711, 2016.
[3]
Huang, X.; Belongie, S. Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision, 1510–1519, 2017.
[4]
Li, C.; Wand, M. Combining Markov random fields and convolutional neural networks for image synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2479–2486, 2016.
[5]
Isola, P.; Zhu, J. Y.; Zhou, T.; Efros, A. A. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5967–5976, 2017.
[6]
Zhu, J. Y.; Park, T.; Isola, P.; Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2242–2251, 2017.
[7]
Yi, Z.; Zhang, H.; Tan, P.; Gong, M. DualGAN: Unsupervised dual learning for image-to-image translation. In: Proceedings of the IEEE International Conference on Computer Vision, 2868–2876, 2017.
[8]
Chen, Y.; Lai, Y. K.; Liu, Y. J. CartoonGAN: Generative adversarial networks for photo cartoonization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9465–9474, 2018.
[9]
Wang, X.; Yu, J. Learning to cartoonize using white-box cartoon representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8090–8099, 2020.
[10]
Kim, J.; Kim, M.; Kang, H.; Lee, K. U-GAT-IT: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. arXiv preprint arXiv: 1907.10830, 2019.
[11]

Su, H.; Niu, J.; Liu, X.; Li, Q.; Cui, J.; Wan, J. MangaGAN: Unpaired photo-to-manga translation based on the methodology of manga drawing. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 35, No. 3, 2611–2619, 2021.

[12]

Song, G.; Luo, L.; Liu, J.; Ma, W. C.; Lai, C.; Zheng, C.; Cham, T. J. AgileGAN: Stylizing portraits by inversion-consistent transfer learning. ACM Transactions on Graphics Vol. 40, No. 4, Article No. 117, 2021.

[13]
Ojha, U.; Li, Y.; Lu, J.; Efros, A. A.; Jae Lee, Y.; Shechtman, E.; Zhang, R. Few-shot image generation via cross-domain correspondence. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10743–10752, 2021.
[14]

Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 43, No. 12, 4217–4228, 2021.

[15]
Richardson, E.; Alaluf, Y.; Patashnik, O.; Nitzan, Y.; Azar, Y.; Shapiro, S.; Cohen-Or, D. Encoding in style: A StyleGAN encoder for image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2287–2296, 2021.
[16]
Xiao, J.; Li, L.; Wang, C.; Zha, Z. J.; Huang, Q. Few shot generative model adaption via relaxed spatial structural alignment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11194–11203, 2022.
[17]
Zhu, P.; Abdal, R.; Femiani, J.; Wonka, P. Mind the gap: Domain gap control for single shot domain adaptation for generative adversarial networks. arXiv preprint arXiv: 2110.08398, 2021.
[18]
Chong, M. J.; Forsyth, D. JoJoGAN: One shot face stylization. In: Computer Vision – ECCV 2022. Lecture Notes in Computer Science, Vol. 13676. Avidan, S.; Brostow, G.; Cissé, M.; Farinella, G. M.; Hassner, T. Eds. Springer Cham, 128–152, 2022.
[19]
Radford, A.; Kim, J. W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In: Proceedings of the 38th International Conference on Machine Learning, 8748–8763, 2021.
[20]
Yi, R.; Liu, Y. J.; Lai, Y. K.; Rosin, P. L. APDrawingGAN: Generating artistic portrait drawings from face photos with hierarchical GANs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10743–10752, 2019.
[21]
Yi, R.; Liu, Y. J.; Lai, Y. K.; Rosin, P. L. Unpaired portrait drawing generation via asymmetric cycle mapping. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8217–8225, 2020.
[22]
Li, Y.; Fang, C.; Yang, J.; Wang, Z.; Lu, X.; Yang, M. H.; Li, Y.; Fang, C.; Yang, J.; Wang, Z.; et al. Universal style transfer via feature transforms. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 385–395, 2017.
[23]
Song, C.; Wu, Z.; Zhou, Y.; Gong, M.; Huang, H. ETNet: Error transition network for arbitrary style transfer. arXiv preprint arXiv: 1910.12056, 2019
[24]
Huang, X.; Liu, M. Y.; Belongie, S.; Kautz, J. Multimodal unsupervised image-to-image translation. In: Proceedings of the European Conference on Computer Vision, 172–189, 2018.
[25]
Lee, H. Y.; Tseng, H. Y.; Huang, J. B.; Singh, M.; Yang, M. H. Diverse image-to-image translation via disentangled representations. In: Proceedings of the European Conference on Computer Vision, 35–51, 2018.
[26]
Li, Y.; Chen, X.; Wu, F.; Zha, Z. J. LinesToFacePhoto: Face photo generation from lines with conditional self-attention generative adversarial networks. In: Proceedings of the 27th ACM International Conference on Multimedia, 2323–2331, 2019.
[27]
Chen, S. Y.; Su, W.; Gao, L.; Xia, S.; Fu, H. DeepFaceDrawing: Deep generation of face images from sketches. arXiv preprint arXiv: 2105.08935, 2021.
[28]
Chen, S. Y.; Liu, F. L.; Li, C.; Gao, L.; Lai, Y. K.; Rosin, P. L.; Chen, S. Y.; Fu, H.; Gao, L. DeepFaceEditing: Deep face generation and editing with disentangled geometry and appearance control. arXiv preprint arXiv: 2105.08935, 2021.
[29]
Yang, S.; Jiang, L.; Liu, Z.; Loy, C. C. Pastiche master: Exemplar-based high-resolution portrait style transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7693–7702, 2022.
[30]

Men, Y.; Yao, Y.; Cui, M.; Lian, Z.; Xie, X. DCT-net: Domain-calibrated translation for portrait stylization. ACM Transactions on Graphics Vol. 41, No. 4, Article No. 140, 2022.

[31]
Liu, M. Y.; Huang, X.; Mallya, A.; Karras, T.; Aila, T.; Lehtinen, J.; Kautz, J. Few-shot unsupervised image-to-image translation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 10551–10560, 2019.
[32]
Gal, R.; Patashnik, O.; Maron, H.; Chechik, G.; Cohen-Or, D. StyleGAN-NADA: CLIP-guided domain adaptation of image generators. arXiv preprint arXiv: 2108.00946, 2021.
[33]
Zhang, R.; Isola, P.; Efros, A. A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 586–595, 2018.
[34]
Wang, Y.; Yi, R.; Li, L.; Tai, Y.; Wang, C.; Ma, L. CtlGAN: Few-shot artistic portraits generation with contrastive transfer learning. arXiv preprint arXiv: 2203.08612, 2022.
[35]
Shah, V.; Sarkar, A.; Anitha, S. K.; Lazebnik, S. MultiStyleGAN: Multiple one-shot image stylizations using a single GAN. arXiv preprint arXiv: 2210.04120, 2022.
[36]
Wen, Y.; Zhang, K.; Li, Z.; Qiao, Y. A discriminative feature learning approach for deep face recognition. In: Computer Vision – ECCV 2016. Lecture Notes in Computer Science, Vol. 9911. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 499–515, 2016.
[37]
He, X.; Zhou, Y.; Zhou, Z.; Bai, S.; Bai, X. Triplet-center loss for multi-view 3D object retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1945–1954, 2018.
[38]

Zhang, K.; Zhang, Z.; Li, Z.; Qiao, Y. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters Vol. 23, No. 10, 1499–1503, 2016.

[39]
Heitz, E.; Vanhoey, K.; Chambon, T.; Belcour, L. A sliced Wasserstein loss for neural texture synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9412–9420, 2021.
[40]

Ma, D. S.; Correll, J.; Wittenbrink, B. The Chicago face database: A free stimulus set of faces and norming data. Behavior Research Methods Vol. 47, No. 4, 1122–1135, 2015.

[41]

Yaniv, J.; Newman, Y.; Shamir, A. The face of art: Landmark detection and geometric style in portraits. ACM Transactions on Graphics Vol. 38, No. 4, Article No. 60, 2019.

[42]
Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and improving the image quality of StyleGAN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8110–8119, 2020.
[43]

Wang, X.; Tang, X. Face photo-sketch synthesis and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 31, No. 11, 1955–1967, 2009.

[44]
Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved techniques for training GANs. In: Proceedings of the 30th Conference on Neural Information Processing Systems, 2234–2242, 2016.
Computational Visual Media
Pages 269-287
Cite this article:
Zhou Y, Li S, Huang H. Deformable few-shot face cartoonization via local to global translation. Computational Visual Media, 2025, 11(2): 269-287. https://doi.org/10.26599/CVM.2025.9450348

26

Views

1

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 21 December 2022
Accepted: 02 April 2023
Published: 08 May 2025
© The Author(s) 2025.

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

To submit a manuscript, please go to https://jcvm.org.

Return