Learning coherent portrait-to-anime translation via latent cyclic transformation

Yangyang Xu; Shengfeng He; Kwan-Yee K. Wong; Ping Luo

doi:10.26599/CVM.2025.9450454

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

PDF (16.4 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Research Article | Open Access

Learning coherent portrait-to-anime translation via latent cyclic transformation

Yangyang Xu^¹, Shengfeng He^², Kwan-Yee K. Wong^³, Ping Luo^³(

)

School of Intelligence Science and Engineering, Harbin Institute of Technology (Shenzhen), Shenzhen, China

School of Computing and Information Systems, Singapore Management University, Singapore 188065, Singapore

Department of Computer Science, the University of Hong Kong, Hong Kong 999077, China

Show Author Information

Abstract

Translating real portrait video into anime is an application of interest to both consumers and researchers. However, anime differs considerably from portraits, making portrait-to-anime translation challenging. Existing StyleGAN-based portrait stylization works assume that the portrait and stylized generators share the same latent space, but this assumption fails in the style of anime due to the large domain gap. Moreover, directly applying them to each video frame often leads to undesirable temporal inconsistencies. In this paper, we argue that two latent spaces with a large domain gap cannot be shared but can be related by a transformation, and develop a cyclic transformation network to connect the two spaces with two cycle constraints. This provides high-quality translation for each frame. We extend our framework to video transformation by proposing a novel frame interpolation constraint which ensures that in-between frames can be interpolated from their neighboring frames, guaranteeing temporal coherence across translated frames. Together with latent code smoothing regularization, this provides temporally coherent video-to-anime translation. Extensive experiments demonstrate that our framework outperforms state-of-the-art methods both qualitatively and quantitatively.

Graphical Abstract

Keywords

GANs anime video translation

Electronic Supplementary Material

Video

cvm-12-3-787_ESM.mp4

References

【1】

Crossref Google Scholar

Computational Visual Media

Volume 12 Issue 3,
June 2026

Pages 787-801

DOI: 10.26599/CVM.2025.9450454

	{{item.num}}
{{version.versionName}} Author Response
{{version.versionName}} Review comment

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Cite this Report

. . , , {{reviewData.reportCite.doi}}

Cite this article:

Xu Y, He S, Wong K-YK, et al. Learning coherent portrait-to-anime translation via latent cyclic transformation. Computational Visual Media, 2026, 12(3): 787-801. https://doi.org/10.26599/CVM.2025.9450454

352

Views

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Received: 30 December 2023

Accepted: 22 July 2024

Published: 06 March 2026

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

To submit a manuscript, please go to https://jcvm.org.