AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (13.6 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Research Article | Open Access

FaceCLIP: CLIP-driven accurate and detailed 3D face reconstruction from a single image

School of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China
State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing 100191, China
Virtual Reality Research Institute, Beihang University Qingdao Research Institute, Qingdao 266100, China
Show Author Information

Abstract

In recent years, 3D face reconstruction has become a research hotspot in computer graphics and computer vision. Most current 3DMM-based methods focus on learning displacement maps to recover high-frequency facial details. However, they focus less on learning mid-frequency facial details and introduce displacement maps with noise, decreasing face reconstruction accuracy. Thus, this work presents a novel approach to regressing accurate and detailed 3D face shapes. First, we design a novel feature consistency loss to recover mid-frequency facial details. Specifically, we exploit the powerful CLIP as prior knowledge of faces to extract geometric and semantic features, which helps guide the reconstructed 3D geometric details to match local details in the input image. Furthermore, we propose a parameter refinement module to learn fine-grained features. It helps to obtain accurate model parameters and improve the accuracy of facial reconstruction. Extensive experiments on a FaceScape and a REALY benchmark demonstrate that our method outperforms several state-of-the-art methods in reconstruction accuracy. Furthermore, comprehensive qualitative results show that our approach achieves better visual performance than existing methods.

Graphical Abstract

References

【1】
【1】
 
 
Computational Visual Media
Pages 85-103

{{item.num}}

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Close
Close
Cite this article:
Bao Y, Zhou P, Qi L, et al. FaceCLIP: CLIP-driven accurate and detailed 3D face reconstruction from a single image. Computational Visual Media, 2026, 12(1): 85-103. https://doi.org/10.26599/CVM.2025.9450434

1058

Views

61

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Received: 16 January 2024
Accepted: 19 April 2024
Published: 02 February 2026
© The Author(s) 2025.

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

To submit a manuscript, please go to https://jcvm.org.