AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (43.7 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Research Article | Open Access

MMRelief: Modeling multi-human relief from a single photograph

Faculty of Mechanical Engineering, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan 250014, China
School of Computer Science, Hangzhou Dianzi University. Hangzhou 310018, China
School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210010, China
School of Computer Science and Technology, Shandong University, Jinan 250100, China
Show Author Information

Graphical Abstract

Abstract

This study focuses on multi-human relief modeling using a single photograph. Although previous studies successfully modeled 3D humans from single photographs, they were limited to reconstructing 3D individuals and could not be applied to multi-human scenes with complex inter-body and outer-body occlusions. In this study, we introduce MMRelief, a novel solution that takes a significant step toward high-quality and generalized multi-human relief modeling. MMRelief uses a three-step approach to achieve its objectives. First, it predicts an occlusion-aware depth map based on ZoeDepth [12]. Subsequently, it predicts a detailed normal map using a photo-to-normal network. Finally, MMRelief combines the strengths of both maps and constructs human relief using depth-constrained normal integration. Experimental results demonstrate that MMRelief has achieved state-of-the-art performance in normal human estimation. It can handle different styles of human photos with varying poses and dresses while producing reliefs with accurate body occlusions, reasonable depth ordering, and faithful geometrical details. The project page is at https://github.com/yanqingliu3856/MMRelief.

References

[1]

Zhang, Y. W.; Zhang, C.; Wang, W.; Chen, Y.; Ji, Z.; Liu, H. Portrait relief modeling from a single image. IEEE Transactions on Visualization and Computer Graphics Vol. 26, No. 8, 2659–2670, 2020.

[2]

Zhang, Y. W.; Luo, P.; Zhou, H.; Ji, Z.; Liu, H.; Chen, Y.; Zhang, C. Neural modeling of portrait bas-relief from a single photograph. IEEE Transactions on Visualization and Computer Graphics Vol. 29, No. 12, 5008–5019, 2022.

[3]

Liu, Y.; Ji, Z.; Zhang, Y. W.; Xu, G. Example-driven modeling of portrait bas-relief. Computer Aided Geometric Design Vol. 80, Article No. 101860, 2020.

[4]

Zhang, Y. W.; Wang, J.; Wang, W.; Chen, Y.; Liu, H.; Ji, Z.; Zhang, C. Neural modelling of flower bas-relief from 2D line drawing. Computer Graphics Forum Vol. 40, No. 6, 288–303, 2021.

[5]

Zhang, Y. W.; Wang, J.; Long, W.; Liu, H.; Zhang, C.; Chen, Y. Fast solution for Chinese calligraphy relief modeling from 2D handwriting image. The Visual Computer Vol. 36, No. 9, 2241–2250, 2020.

[6]
Saito, S.; Huang, Z.; Natsume, R.; Morishima, S.; Li, H.; Kanazawa, A. PIFu: Pixel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2304–2314, 2019.
[7]
He, T.; Xu, Y.; Saito, S.; Soatto, S.; Tung, T. ARCH++: Animation-ready clothed human reconstruction revisited. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 11046–11056, 2021.
[8]
Alldieck, T.; Zanfir, M.; Sminchisescu, C. Photorealistic monocular 3D reconstruction of humans wearing clothing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1506–1515, 2022.
[9]
Xiu, Y.; Yang, J.; Tzionas, D.; Black, M. J. ICON: Implicit clothed humans obtained from normals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 512–523, 2022.
[10]

Yang, Z.; Chen, B.; Zheng, Y.; Chen, X.; Zhou, K. Human bas-relief generation from a single photograph. IEEE Transactions on Visualization and Computer Graphics Vol. 28, No. 12, 4558–4569, 2022.

[11]

Ranftl, R.; Lasinger, K.; Hafner, D.; Schindler, K.; Koltun, V. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 44, No. 3, 1623–1637, 2022.

[12]
Bhat, S. F.; Birkl, R.; Wofk, D.; Wonka, P.; Müller, M. ZoeDepth: Zero-shot transfer by combining relative and metric depth. arXiv preprint arXiv:2302.12288, 2023.
[13]

Zhang, Y. W.; Wu, J.; Ji, Z.; Wei, M.; Zhang, C. Computer-assisted relief modelling: A comprehensive survey. Computer Graphics Forum Vol. 38, No. 2, 521–534, 2019.

[14]

Alexa, M.; Matusik, W. Reliefs as images. ACM Transactions on Graphics Vol. 29, No. 4, Article No. 1, 2010.

[15]

Wu, J.; Martin, R. R.; Rosin, P. L.; Sun, X. F.; Lai, Y. K.; Liu, Y. H.; Wallraven, C. Use of non-photorealistic rendering and photometric stereo in making bas-reliefs from photographs. Graphical Models Vol. 76, No. 4, 202–213, 2014.

[16]

Yeh, C. K.; Huang, S. Y.; Jayaraman, P. K.; Fu, C. W.; Lee, T. Y. Interactive high-relief reconstruction for organic and double-sided objects from a photo. IEEE Transactions on Visualization and Computer Graphics Vol. 23, No. 7, 1796–1808, 2017.

[17]

Ji, Z.; Sun, X.; Zhang, Y. W.; Ma, W.; Wei, M. Normal manipulation for bas-relief modeling. Graphical Models Vol. 114, Article No. 101099, 2021.

[18]

Ji, Z.; Feng, W.; Sun, X.; Qin, F.; Wang, Y.; Zhang, Y. W.; Ma, W. ReliefNet: Fast bas-relief generation from 3D scenes. Computer-Aided Design Vol. 130, Article No. 102928, 2021.

[19]

Ji, Z.; Zhou, C.; Zhang, Q.; Zhang, Y. W.; Wang, W. A deep residual network for geometric decontouring. Computer Graphics Forum Vol. 39, No. 7, 27–41, 2020.

[20]
Tang, S.; Tan, F.; Cheng, K.; Li, Z.; Zhu, S.; Tan, P. A neural network for detailed human depth estimation from a single image. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 7750–7759, 2019.
[21]
Jafarian, Y.; Park, H. S. Learning high fidelity depths of dressed humans by watching social media dance videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12753–12762, 2021.
[22]
Mustafa, A.; Caliskan, A.; Agapito, L.; Hilton, A. Multi-person implicit reconstruction from a single image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14474–14483, 2021.
[23]
Han, S. H.; Park, M. G.; Yoon, J. H.; Kang, J. M.; Park, Y. J.; Jeon, H. G. High-fidelity 3D human digitization from single 2k resolution images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12869–12879, 2023.
[24]
Saito, S.; Simon, T.; Saragih, J.; Joo, H. PIFuHD: Multi-level pixel-aligned implicit function for high-resolution 3D human digitization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 84–93, 2020.
[25]

Zheng, Z.; Yu, T.; Liu, Y.; Dai, Q. PaMIR: Parametric model-conditioned implicit representation for image-based human reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 44, No. 6, 3170–3184, 2022.

[26]
Zheng, Y.; Shao, R.; Zhang, Y.; Yu, T.; Zheng, Z.; Dai, Q.; Liu, Y. DeepMultiCap: Performance capture of multiple characters using sparse multiview cameras. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 6239–6249, 2021.
[27]
Huang, Z.; Xu, Y.; Lassner, C.; Li, H.; Tung, T. ARCH: Animatable reconstruction of clothed humans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3093–3102, 2020.
[28]
Miangoleh, S. M. H.; Dille, S.; Mai L.; Paris, S.; Aksoy, Y. Boosting monocular depth estimation models to high-resolution via content-adaptive multi-resolution merging. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9685–9694, 2021.
[29]
Cao, Z.; Simon, T.; Wei, S. E.; Sheikh, Y. Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7291–7299, 2017.
[30]
Blender. Available at https://www.blender.org
[31]
Yu, T.; Zheng, Z.; Guo, K.; Liu, P.; Dai, Q.; Liu, Y. Function4D: Real-time human volumetric capture from very sparse consumer RGBD sensors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5746–5756, 2021.
[32]
Renderpeople. Available at https://www.renderpeople.com
[33]
Zheng, Y.; Shao, R.; Zhang, Y.; Yu, T.; Zheng, Z.; Dai, Q.; Liu, Y. DeepMultiCap: Performance capture of multiple characters using sparse multiview cameras. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 6239–6249, 2021.
[34]
Chen, X.; Wang, Y.; Chen, X.; Zeng, W. S2R-DepthNet: Learning a generalizable depth-specific structural representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3034–3043, 2021.
[35]
Huang, X.; Liu, M. Y.; Belongie, S.; Kautz, J. Multimodal unsupervised image-to-image translation. In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science, Vol. 11207. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 179–196, 2018.
[36]
Fu, J.; Li, S.; Jiang, Y.; Lin, K. Y.; Qian, C.; Loy, C. C.; Wu, W.; Liu, Z. StyleGAN-human: A data-centric odyssey of human generation. In: Computer Vision – ECCV 2022. Lecture Notes in Computer Science, Vol. 13676. Avidan, S.; Brostow, G.; Cissé, M.; Farinella, G. M.; Hassner, T. Eds. Springer Cham, 1–19, 2022.
[37]

Quéau, Y.; Durou, J. D.; Aujol, J. F. Variational methods for normal integration. Journal of Mathematical Imaging and Vision Vol. 60, No. 4, 609–632, 2018.

[38]
Humano. Available at https://humano3d.com
[39]
Ji, Z.; Che, F.; Liu, H.; Zhao, Z.; Zhang, Y. W.; Wang, W. Photo2Relief: Let human in the photograph stand out. arXiv preprint arXiv:2307.11364, 2023.
[40]
Fernandez Abrevaya, V.; Boukhayma, A.; Torr, P. H. S.; Boyer, E. Cross-modal deep face normals with deactivable skip connections. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4979–4989, 2020.
[41]

Zhang, Y. W.; Qin, B. B.; Chen, Y.; Ji, Z.; Zhang, C. Portrait relief generation from 3D object. Graphical Models Vol. 102, 10–18, 2019.

[42]
Xiu, Y.; Yang, J.; Tzionas, D.; Black, M. J. ICON: Implicit clothed humans obtained from normals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13296–13306, 2022.
[43]
Ma, Q.; Yang, J.; Ranjan, A.; Pujades, S.; Pons-Moll, G.; Tang, S.; Black, M. J. Learning to dress 3D people in generative clothing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6468–6467, 2020.
[44]
Lin, S.; Yang, L.; Saleemi, I.; Sengupta, S. Robust high-resolution video matting with temporal guidance. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 3132–3141, 2022.
Computational Visual Media
Pages 531-548

{{item.num}}

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Close
Close
Cite this article:
Zhang Y-W, Liu Y, Yang H, et al. MMRelief: Modeling multi-human relief from a single photograph. Computational Visual Media, 2025, 11(3): 531-548. https://doi.org/10.26599/CVM.2025.9450394

123

Views

23

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 25 July 2023
Accepted: 26 November 2023
Published: 19 May 2025
© The Author(s) 2025.

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

To submit a manuscript, please go to https://jcvm.org.