Sort:
Open Access Research Article Issue
GRIG: Data-efficient generative residual image inpainting
Computational Visual Media 2025, 11(6): 1329-1361
Published: 12 December 2025
Abstract PDF (110.1 MB) Collect
Downloads:37

Image inpainting is the task of filling in missing or masked regions of an image with semantically meaningful content. Recent methods have shown significant improvement in dealing with large missing regions. However, these methods usually require large training datasets to achieve satisfactory results, and there has been limited research into training such models on a small number of samples. To address this, we present a novel data-efficient generative residual image inpainting method that produces high-quality inpainting results. The core idea is to use an iterative residual reasoning method that incorporates convolutional neural networks (CNNs) for feature extraction and transformers for global reasoning within generative adversarial networks, along with image-level and patch-level discriminators. We also propose a novel forged-patch adversarial training strategy to create faithful textures and detailed appearances. Extensive evaluation shows that our method outperforms previous methods on the data-efficient image inpainting task, both quantitatively and qualitatively.

Open Access Research Article Issue
Emotion amplification of facial videos using a fine-tuned StyleGAN
Computational Visual Media 2025, 11(3): 587-601
Published: 19 May 2025
Abstract PDF (21.8 MB) Collect
Downloads:102

The ability to exhibit appropriate emotions is crucial for the expressiveness and attractiveness of facial videos. However, it is difficult to control the level of emotion, even for experienced actors and amateur podcasters on social networks. In this study, we aim to solve the novel problem of semantically amplifying the emotions of a facial video. This poses new challenges for effectively editing a sequence of video frames in terms of face semantics, emotion adaptiveness, and temporal coherence. Our approach is based on semantic face editing in the disentangled latent space of a state-of-the-art StyleGAN model. We presented a new face dataset with diverse emotions to fine-tune the pretrained StyleGAN and improve the expressiveness of its original emotion-biased latent space. An emotion-editing subspace was constructed to allow adaptive emotion amplification while preserving other facial attributes. We further propose an effective stitching-tuning technique to ensure temporally coherent video frames. Our work results in plausible emotion amplification for a wide range of facial videos. Qualitative and quantitative evaluations demonstrated the advantages of our method over other baseline methods. The proposed dataset and research code will be made publicly available.

Open Access Research Article Issue
Script-to-Storyboard: A new contextual retrieval dataset and benchmark
Computational Visual Media 2025, 11(1): 103-122
Published: 28 February 2025
Abstract PDF (27.4 MB) Collect
Downloads:231

Storyboards comprising key illustrations and images help filmmakers to outline ideas, key moments, and story events when filming movies. Inspired by this, we introduce the first contextual benchmark dataset Script-to-Storyboard (Sc2St) composed of storyboards to explicitly express story structures in the movie domain, and propose the contextual retrieval task to facilitate movie story understanding. The Sc2St dataset contains fine-grained and diverse texts, annotated semantic keyframes, and coherent storylines in storyboards, unlike existing movie datasets. The contextual retrieval task takes as input a multi-sentence movie script summary with keyframe history and aims to retrieve a future keyframe described by a corresponding sentence to form the storyboard. Compared to classic text-based visual retrieval tasks, this requires capturing the context from the description (script) and keyframe history. We benchmark existing text-based visual retrieval methods on the new dataset and propose a recurrent-based framework with three variants for effective context encoding. Comprehensive experiments demonstrate that our methods compare favourably to existing methods; ablation studies validate the effectiveness of the proposed context encoding approaches.

Open Access Review Article Issue
A survey of deep learning-based 3D shape generation
Computational Visual Media 2023, 9(3): 407-442
Published: 18 May 2023
Abstract PDF (8.9 MB) Collect
Downloads:766

Deep learning has been successfully used for tasks in the 2D image domain. Research on 3D computer vision and deep geometry learning has also attracted attention. Considerable achievements have been made regarding feature extraction and discrimination of 3D shapes. Following recent advances in deep generative models such as generative adversarial networks, effective generation of 3D shapes has become an active research topic. Unlike 2D images with a regular grid structure, 3D shapes have various representations, such as voxels, point clouds, meshes, and implicit functions. For deep learning of 3D shapes, shape representation has to be taken into account as there is no unified representation that can cover all tasks well. Factors such as the representativeness of geometry and topology often largely affect the quality of the generated 3D shapes. In this survey, we comprehensively review works on deep-learning-based 3D shape generation by classifying and discussing them in terms of the underlying shape representation and the architecture of the shape generator. The advantages and disadvantages of each class are further analyzed. We also consider the 3D shape datasets commonly used for shape generation. Finally, we present several potential research directions that hopefully can inspire future works on this topic.

Open Access Research Article Issue
Interactive modeling of lofted shapes from a single image
Computational Visual Media 2020, 6(3): 279-289
Published: 04 December 2019
Abstract PDF (773.7 KB) Collect
Downloads:79

Modeling the complete geometry of general shapes from a single image is an ill-posed problem. User hints are often incorporated to resolve ambiguities and provide guidance during the modeling process. In this work, we present a novel interactive approach for extracting high-quality freeform shapes from a single image. This is inspired by the popular lofting technique in many CAD systems, and only requires minimal user input. Given an input image, the user only needs to sketch several projected cross sections, provide a "main axis" , and specify some geometric relations. Our algorithm then automatically optimizes the common normal to the sections with respect to these constraints, and interpolates between the sections, resulting in a high-quality 3D model that conforms to both the original image and the user input. The entire modeling session is efficient and intuitive. We demonstrate the effectiveness of our approach based on qualitative tests on a variety of images, and quantitative comparisons with the ground truth using synthetic images.

Total 5