FaceCLIP: CLIP-driven accurate and detailed 3D face reconstruction from a single image

Yongtang Bao; Pengfei Zhou; Liang Qi; Yue Qi; Haojie Li

doi:10.26599/CVM.2025.9450434

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

PDF (13.6 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Research Article | Open Access

FaceCLIP: CLIP-driven accurate and detailed 3D face reconstruction from a single image

Yongtang Bao^{¹^,²}, Pengfei Zhou^²(

), Liang Qi^¹, Yue Qi^{²^,³}, Haojie Li^¹(

)

School of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China

State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing 100191, China

Virtual Reality Research Institute, Beihang University Qingdao Research Institute, Qingdao 266100, China

Show Author Information

Abstract

In recent years, 3D face reconstruction has become a research hotspot in computer graphics and computer vision. Most current 3DMM-based methods focus on learning displacement maps to recover high-frequency facial details. However, they focus less on learning mid-frequency facial details and introduce displacement maps with noise, decreasing face reconstruction accuracy. Thus, this work presents a novel approach to regressing accurate and detailed 3D face shapes. First, we design a novel feature consistency loss to recover mid-frequency facial details. Specifically, we exploit the powerful CLIP as prior knowledge of faces to extract geometric and semantic features, which helps guide the reconstructed 3D geometric details to match local details in the input image. Furthermore, we propose a parameter refinement module to learn fine-grained features. It helps to obtain accurate model parameters and improve the accuracy of facial reconstruction. Extensive experiments on a FaceScape and a REALY benchmark demonstrate that our method outperforms several state-of-the-art methods in reconstruction accuracy. Furthermore, comprehensive qualitative results show that our approach achieves better visual performance than existing methods.

Graphical Abstract

Keywords

3D face reconstruction CLIP single image

References

【1】

Crossref Google Scholar

Computational Visual Media

Volume 12 Issue 1,
February 2026

Pages 85-103

DOI: 10.26599/CVM.2025.9450434

	{{item.num}}
{{version.versionName}} Author Response
{{version.versionName}} Review comment

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Cite this Report

. . , , {{reviewData.reportCite.doi}}

Cite this article:

Bao Y, Zhou P, Qi L, et al. FaceCLIP: CLIP-driven accurate and detailed 3D face reconstruction from a single image. Computational Visual Media, 2026, 12(1): 85-103. https://doi.org/10.26599/CVM.2025.9450434

1264

Views

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Received: 16 January 2024

Accepted: 19 April 2024

Published: 02 February 2026

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

To submit a manuscript, please go to https://jcvm.org.