Scholar - SciOpen

Representing 3D faces using generative models has been investigated for several years for its numerous applications in computer vision and graphics. However, general 3D face manipulation is often limited by the lack of multi-level interpretability of the latent space in 3D generative models. To address this problem, we propose a novel generative approach dubbed hierarchically semantic regularized variational auto-encoders (HSR-VAE), which explicitly endows latent variables with multi-grained semantics of the synthesized 3D face shapes. Specifically, to accommodate the hierarchical structure of the human face, we decompose the latent space to represent variations in facial features at different scales, from local facial segments to fine-grained attributes. Moreover, part-aware and attribute-aware semantic regularizers are introduced to establish a linkage between hierarchically organized latent variables and multi-grained facial semantics, allowing more interpretable and meaningful representations of the 3D face. Extensive quantitative and qualitative experiments show the effectiveness of HSR-VAE and demonstrate that it can provide a more interpretable, manipulable, and generalizable latent representation than current approaches, facilitating a wide range of 3D face shape manipulation tasks.

Survey Issue

A Survey of Multimodal Controllable Diffusion Models

Rui Jiang, Guang-Cong Zheng, Teng Li, Tian-Rui Yang, Jing-Dong Wang, Xi Li

Journal of Computer Science and Technology 2024, 39(3): 509-541

Published: 22 July 2024

Abstract Collect Collected

Diffusion models have recently emerged as powerful generative models, producing high-fidelity samples across domains. Despite this, they have two key challenges, including improving the time-consuming iterative generation process and controlling and steering the generation process. Existing surveys provide broad overviews of diffusion model advancements. However, they lack comprehensive coverage specifically centered on techniques for controllable generation. This survey seeks to address this gap by providing a comprehensive and coherent review on controllable generation in diffusion models. We provide a detailed taxonomy defining controlled generation for diffusion models. Controllable generation is categorized based on the formulation, methodologies, and evaluation metrics. By enumerating the range of methods researchers have developed for enhanced control, we aim to establish controllable diffusion generation as a distinct subfield warranting dedicated focus. With this survey, we contextualize recent results, provide the dedicated treatment of controllable diffusion model generation, and outline limitations and future directions. To demonstrate applicability, we highlight controllable diffusion techniques for major computer vision tasks application. By consolidating methods and applications for controllable diffusion models, we hope to catalyze further innovations in reliable and scalable controllable generation.

Total 2