High fidelity virtual try-on network via semantic adaptation and distributed componentization

Chenghu Du; Feng Yu; Minghua Jiang; Ailing Hua; Yaxin Zhao; Xiong Wei; Tao Peng; Xinrong Hu

doi:10.1007/s41095-021-0264-2

Computational Visual Media 2022, 8(4): 649-663 https://doi.org/10.1007/s41095-021-0264-2

Research Article |

Open Access | Issue | Published: 16 June 2022

High fidelity virtual try-on network via semantic adaptation and distributed componentization

Show Author's Information Hide Author's Information Chenghu Du^¹, Feng Yu^{¹^,²}(

), Minghua Jiang^{¹^,²}, Ailing Hua^¹, Yaxin Zhao^¹, Xiong Wei^¹, Tao Peng^{¹^,²}, Xinrong Hu^{¹^,²}

1School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan 430200, China

2Engineering Research Center of Hubei Province for Clothing Information, Wuhan 430200, China

Keywords:

virtual try-on, conditional image synthesis, human parsing, thin plate spline, semantic adaptation

Cite this article:

Du C, Yu F, Jiang M, et al. High fidelity virtual try-on network via semantic adaptation and distributed componentization. Computational Visual Media, 2022, 8(4): 649-663. https://doi.org/10.1007/s41095-021-0264-2

Download citation

EndNote(RIS)

BibTeX

752

Views

Downloads

Citations

Crossref

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

Image-based virtual try-on systems have significant commercial value in online garment shopping. However, prior methods fail to appropriately handle details, so are defective in maintaining the original appearance of organizational items including arms, the neck, and in-shop garments. We propose a novel high fidelity virtual try-on network to generate realistic results. Specifically, a distributed pipeline is used for simultaneous generation of organizational items. First, the in-shop garment is warped using thin plate splines (TPS) to give a coarse shape reference, and then a corresponding target semantic map is generated, which can adaptively respond to the distribution of different items triggered by different garments. Second, organizational items are componentized separately using our novel semantic map-based image adjustment network (SMIAN) to avoid interference between body parts. Finally, all components are integrated to generatethe overall result by SMIAN. A priori dual-modalinformation is incorporated in the tail layers of SMIAN to improve the convergence rate of the network. Experiments demonstrate that the proposed method can retain better details of condition information than current methods. Our method achieves convincing quantitative and qualitative results on existing benchmark datasets.

Full text

Abstract

Full text

Outline

About this article

High fidelity virtual try-on network via semantic adaptation and distributed componentization

Show Author's information Hide Author's Information Chenghu Du^¹, Feng Yu^{¹^,²}(

), Minghua Jiang^{¹^,²}, Ailing Hua^¹, Yaxin Zhao^¹, Xiong Wei^¹, Tao Peng^{¹^,²}, Xinrong Hu^{¹^,²}

1School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan 430200, China

2Engineering Research Center of Hubei Province for Clothing Information, Wuhan 430200, China

Abstract

Keywords: virtual try-on, conditional image synthesis, human parsing, thin plate spline, semantic adaptation

References(47)

[1]

Jetchev, N.; Bergmann, U. The conditional analogy GAN: Swapping fashion articles on people images. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, 2287-2292, 2017.

DOI

[2]

Han, X. T.; Wu, Z. X.; Wu, Z.; Yu, R. C.; Davis, L. S. VITON: An image-based virtual try-on network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7543-7552, 2018.

DOI

[3]

Lee, H. J.; Lee, R.; Kang, M.; Cho, M.; Park, G. LA-VITON: A network for looking-attractive virtual try-on. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop, 3129-3132, 2019.

DOI

[4]

Wang, B.; Zheng, H.; Liang, X.; Chen, Y.; Lin, L.; Yang, M. Toward characteristic-preserving image-based virtual try-on network. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11217. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 607-623, 2018.

[5]

Han, X. T.; Huang, W. L.; Hu, X. J.; Scott, M. ClothFlow: A flow-based model for clothed person generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 10470-10479, 2019.

DOI

[6]

Ma, Q. L.; Yang, J. L.; Ranjan, A.; Pujades, S.; Pons-Moll, G.; Tang, S. Y.; Black, M. J. Learning to dress 3D people in generative clothing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6468-6477, 2020.

[7]

Mir, A.; Alldieck, T.; Pons-Moll, G. Learning to transfer texture from clothing images to 3D humans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7021-7032, 2020.

DOI

[8]

Zhu, H. M.; Cao, Y.; Jin, H.; Chen, W. K.; Du, D.; Wang, Z. Y.; Cui, S.; Han, X. Deep Fashion3D: A dataset and benchmark for 3D garment reconstruction from single images. In: Computer Vision - ECCV 2020. Lecture Notes in Computer Science, Vol. 12346. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 512-530, 2020.

[9]

Lähner, Z.; Cremers, D.; Tung, T. DeepWrinkles: Accurate and realistic clothing modeling. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11208. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 698-715, 2018.

DOI

[10]

Liang, J. B.; Lin, M. C. Machine learning for digital try-on: Challenges and progress. Computational Visual Media Vol. 7, No. 2, 159-167, 2021.

DOI Google Scholar

[11]

Zheng, Z. H.; Zhang, H. T.; Zhang, F. L.; Mu, T. J. Image-based clothes changing system. Computational Visual Media Vol. 3, No. 4, 337-347, 2017.

DOI Google Scholar

[12]

Neuberger, A.; Borenstein, E.; Hilleli, B.; Oks, E.; Alpert, S. Image based virtual try-on network from unpaired data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5183-5192, 2020.

DOI

[13]

Rocco, I.; Arandjelovi? R.; Sivic, J. Convolutional neural network architecture for geometric matching. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 41, No. 11, 2553-2567, 2019.

DOI Google Scholar

[14]

Duchon, J. Splines minimizing rotation-invariant semi-norms in Sobolev spaces. In: Constructive Theory of Functions of Several Variables. Lecture Notes in Mathematics, Vol. 571. Schempp, W.; Zeller, K. Eds. Springer Berlin Heidelberg, 85-100, 1977.

DOI

[15]

Minar, M. R.; Tuan, T. T.; Ahn, H.; Rosin, P.; Lai. Y.-K. CP-VTON+: Clothing shape and texture preserving image-based virtual try-on. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020.

[16]

Yang, H.; Zhang, R. M.; Guo, X. B.; Liu, W.; Zuo, W. M.; Luo, P. Towards photo-realistic virtual try-on by adaptively Generating↔Preserving image content. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7847-7856, 2020.

DOI

[17]

Goodfellow, I. J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Communications of the ACM Vol. 63, No. 11, 139-144, 2020.

DOI Google Scholar

[18]

Yu, R. Y.; Wang, X. Q.; Xie, X. H. VTNFP: An image-based virtual try-on network with body and clothing feature preservation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 10510-10519, 2019.

[19]

Karras, T.; Laine, S.; Aila, T. M. A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition, 4396-4405, 2019.

DOI

[20]

Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.

Google Scholar

[21]

Jo, Y.; Park, J. SC-FEGAN: Face editing generative adversarial network with user’s sketch and color. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 1745-1753, 2019.

DOI

[22]

Choi, Y.; Choi, M.; Kim, M.; Ha, J. W.; Kim, S.; Choo, J. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8789-8797, 2018.

DOI

[23]

Honda, S. VITON-GAN: Virtual try-on image gene-rator trained with adversarial loss. In: Proceedings of the Eurographics 2019 - Posters, 2019.

[24]

Cui, Y. R.; Liu, Q.; Gao, C. Y.; Su, Z. FashionGAN: Display your fashion design using Conditional Generative Adversarial Nets. Computer Graphics Forum Vol. 37, No. 7, 109-119, 2018.

DOI Google Scholar

[25]

Zhang, F.; Zhu, X. T.; Dai, H. B.; Ye, M.; Zhu, C. Distribution-aware coordinate representation for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7091-7100, 2020.

DOI

[26]

Cheng, B. W.; Xiao, B.; Wang, J. D.; Shi, H. H.; Huang, T. S.; Zhang, L. HigherHRNet: Scale-aware representation learning for bottom-up human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5385-5394, 2020.

DOI

[27]

Cao, Z.; Hidalgo, G.; Simon, T.; Wei, S. H.; Sheikh, Y. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 43, No. 1, 172-186, 2021.

DOI Google Scholar

[28]

Gong, K.; Liang, X. D.; Zhang, D. Y.; Shen, X. H.; Lin, L. Look into person: Self-supervised structure-sensitive learning and a new benchmark for human parsing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6757-6765, 2017.

DOI

[29]

Wang, W.; Yu, K. C.; Hugonot, J.; Fua, P.; Salzmann, M. Recurrent U-net for resource-constrained segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2142-2151, 2019.

DOI

[30]

Huang, Z.; Wang, X.; Huang, L.; Huang, C.; Wei, Y.; Liu, W. CCNet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 603-612, 2019.

DOI

[31]

Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431-3440, 2015.

DOI

[32]

Osman, A. A. A.; Bolkart, T.; Black, M. J. STAR: Sparse trained articulated human body regressor. In: Computer Vision - ECCV 2020. Lecture Notes in Computer Science, Vol. 12351. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 598-613, 2020.

[33]

Zhao, F. W.; Xie, Z. Y.; Kampffmeyer, M.; Dong, H. Y.; Han, S. F.; Zheng, T. X.; Zhang, T.; Liang, X. M3D-VTON: A monocular-to-3D virtual try-on network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 13219-13229, 2021.

DOI

[34]

Cui, A.; McKee, D.; Lazebnik, S. Dressing in order: Recurrent person image generation for pose transfer, virtual try-on and outfit editing. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 14638-14647, 2021.

DOI

[35]

Choi, S.; Park, S.; Lee, M.; Choo, J. VITON-HD: High-resolution virtual try-on via misalignment-aware normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14126-14135, 2021.

DOI

[36]

Isola, P.; Zhu, J. Y.; Zhou, T. H.; Efros, A. A. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5967-5976, 2017.

DOI

[37]

Wang, T. C.; Liu, M. Y.; Zhu, J. Y.; Tao, A.; Kautz, J.; Catanzaro, B. High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8798-8807, 2018.

DOI

[38]

Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015. Lecture Notes in Computer Science, Vol. 9351. Navab, N.; Hornegger, J.; Wells, W.; Frangi, A. Eds. Springer Cham, 234-241, 2015.

DOI

[39]

Men, Y. F.; Mao, Y. M.; Jiang, Y. N.; Ma, W. Y.; Lian, Z. H. Controllable person image synthesis with attribute-decomposed GAN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5083-5092, 2020.

DOI

[40]

He, K. M.; Zhang, X. Y.; Ren, S. Q.; Sun, J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770-778, 2016.

[41]

Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

Google Scholar

[42]

Kingma, D. P.; Ba, J. L. Adam: A method for sto-chastic optimization. In: Proceedings of the Interna-tional Conference on Learning Representations, 2015.

[43]

Wang, Z.; Bovik, A. C.; Sheikh, H. R.; Simoncelli, E. P. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing Vol. 13, No. 4, 600-612, 2004.

DOI Google Scholar

[44]

Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 6629-6640, 2017.

[45]

Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved techniques for training GANs. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, 2234-2242, 2016.

[46]

Jandial, S.; Chopra, A.; Ayush, K.; Hemani, M.; Kumar, A.; Krishnamurthy, B. SieveNet: A unified framework for robust image-based virtual try-on. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2171-2179, 2020.

DOI

[47]

Ge, C. J.; Song, Y. B.; Ge, Y. Y.; Yang, H.; Liu, W.; Luo, P. Disentangled cycle consistency for highly-realistic virtual try-on. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16923-16932, 2021.

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 14 September 2021

Accepted: 03 November 2021

Published: 16 June 2022

Issue date: December 2022

Copyright

Acknowledgements

This manuscript is an extended version of our previous work which appeared at the IEEE International Conference on Tools with Artificial Intelligence (C. Du et al. VTON-HF: High fidelity virtual try-on network via semantic adaptation. ICTAI 2021, 224-231, doi: 10.1109/ICTAI52525.2021.00038). We declare that we submit this manuscript to Computational Visual Media with permission.

We would like to thank the anonymous reviewers for their constructive comments. The findings and observations in this paper are those of the authors and do not necessarily reflect the views of the supporters.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www. editorialmanager.com/cvmj.