Sort:
Open Access Research Article Issue
AdaPIP: Adaptive picture-in-picture guidance for 360° film watching
Computational Visual Media 2024, 10 (3): 487-503
Published: 02 May 2024
Abstract PDF (4.9 MB) Collect
Downloads:20

360 videos enable viewers to watch freelyfrom different directions but inevitably prevent them from perceiving all the helpful information. To mitigate this problem, picture-in-picture (PIP) guidance was proposed using preview windows to show regions of interest (ROIs) outside the current view range. We identify several drawbacks of this representation and propose a new method for 360 film watching called AdaPIP. AdaPIP enhances traditional PIP by adaptively arranging preview windows with changeable view ranges and sizes. In addition, AdaPIP incorporates the advantage of arrow-based guidance by presenting circular windows with arrows attached to them to help users locate the corresponding ROIs more efficiently. We also adapted AdaPIP and Outside-In to HMD-based immersive virtual reality environments to demonstrate the usability of PIP-guided approaches beyond 2D screens. Comprehensive user experiments on 2D screens, as well as in VR environments, indicate that AdaPIP is superior to alternative methods in terms of visual experiences while maintaining a comparable degree of immersion.

Regular Paper Issue
Learning Local Contrast for Crisp Edge Detection
Journal of Computer Science and Technology 2023, 38 (3): 554-566
Published: 30 May 2023
Abstract Collect

In recent years, the accuracy of edge detection on several benchmarks has been significantly improved by deep learning based methods. However, the prediction of deep neural networks is usually blurry and needs further post-processing including non-maximum suppression and morphological thinning. In this paper, we demonstrate that the blurry effect arises from the binary cross-entropy loss, and crisp edges could be obtained directly from deep convolutional neural networks. We propose to learn edge maps as the representation of local contrast with a novel local contrast loss. The local contrast is optimized in a stochastic way to focus on specific edge directions. Experiments show that the edge detection network trained with local contrast loss achieves a high accuracy comparable to previous methods and dramatically improves the crispness. We also present several applications of the crisp edges, including image completion, image retrieval, sketch generation, and video stylization.

Open Access Research Article Issue
Focusing on your subject: Deep subject-aware image composition recommendation networks
Computational Visual Media 2023, 9 (1): 87-107
Published: 18 October 2022
Abstract PDF (11 MB) Collect
Downloads:56

Photo composition is one of the most important factors in the aesthetics of photographs. As a popular application, composition recommendation for a photo focusing on a specific subject has been ignored by recent deep-learning-based composition recommendation approaches. In this paper, we propose a subject-aware image composition recommendation method, SAC-Net, which takes an RGB image and a binary subject window mask as input, and returns good compositions as crops containing the subject. Our model first determines candidate scores for all possible coarse cropping windows. The crops with high candidate scores are selected and further refined by regressing their corner points to generate the output recommended cropping windows. The final scores of the refined crops are predicted by a final score regression module. Unlike existing methods that need to preset several cropping windows, our network is able to automatically regress cropping windows with arbitrary aspect ratios and sizes. We propose novel stability losses for maximizing smoothness when changing cropping windows along with view changes. Experimental results show that our method outperforms state-of-the-art methods not only on the subject-aware image composition recommendation task, but also for general purpose composition recommendation. We also have designed a multi-stage labeling scheme so that a large amount ofranked pairs can be produced economically. Weuse this scheme to propose the first subject-aware composition dataset SACD, which contains 2777 images, and more than 5 million composition ranked pairs. The SACD dataset is publicly available at https://cg.cs.tsinghua.edu.cn/SACD/.

Regular Paper Issue
Local Homography Estimation on User-Specified Textureless Regions
Journal of Computer Science and Technology 2022, 37 (3): 615-625
Published: 31 May 2022
Abstract Collect

This paper presents a novel deep neural network for designated point tracking (DPT) in a monocular RGB video, VideoInNet. More concretely, the aim is to track four designated points correlated by a local homography on a textureless planar region in the scene. DPT can be applied to augmented reality and video editing, especially in the field of video advertising. Existing methods predict the location of four designated points without appropriately considering the point correlation. To solve this problem, VideoInNet predicts the motion of the four designated points correlated by a local homography within the heatmap prediction framework. Our network refines the heatmaps of designated points through two stages. On the first stage, we introduce a context-aware and location-aware structure to learn a local homography for the designated plane in a supervised way. On the second stage, we introduce an iterative heatmap refinement module to improve the tracking accuracy. We propose a dataset focusing on textureless planar regions, named ScanDPT, for training and evaluation. We show that the error rate of VideoInNet is about 29% lower than that of the state-of-the-art approach when testing in the first 120 frames of testing videos on ScanDPT.

Open Access Review Article Issue
Attention mechanisms in computer vision: A survey
Computational Visual Media 2022, 8 (3): 331-368
Published: 15 March 2022
Abstract PDF (2.7 MB) Collect
Downloads:355

Humans can naturally and effectively find salient regions in complex scenes. Motivated by thisobservation, attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image. Attention mechanisms have achieved great success in many visual tasks, including image classification, object detection, semantic segmentation, video understanding, image generation, 3D vision, multi-modal tasks, and self-supervised learning. In this survey, we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention, and branch attention; a related repository https://github.com/MenghaoGuo/Awesome-Vision-Attentions is dedicated to collecting related work. We also suggest future directions for attention mechanism research.

Open Access Review Article Issue
Deep image synthesis from intuitive user input: A review and perspectives
Computational Visual Media 2022, 8 (1): 3-31
Published: 27 October 2021
Abstract PDF (4.6 MB) Collect
Downloads:28

In many applications of computer graphics, art, and design, it is desirable for a user to provide intuitive non-image input, such as text, sketch, stroke, graph, or layout, and have a computer system automatically generate photo-realistic images according to that input. While classically, works that allow such automatic image content generation have followed a framework of image retrieval and composition, recent advances in deep generative models such as generative adversarial networks (GANs), variational autoencoders (VAEs), and flow-based methods have enabled more powerful and versatile image generation approaches. This paper reviews recent works for image synthesis given intuitive user input, covering advances in input versatility, image generation methodology, benchmark datasets, and evaluation metrics. This motivates new perspectives on input representation and interactivity, cross fertilization between major image generation paradigms, and evaluation and comparison of generation methods.

Open Access Research Article Issue
Smoothness preserving layout for dynamic labels by hybrid optimization
Computational Visual Media 2022, 8 (1): 149-163
Published: 27 October 2021
Abstract PDF (2.1 MB) Collect
Downloads:14

Stable label movement and smooth label trajectory are critical for effective information understanding. Sudden label changes cannot be avoided by whatever forced directed methods due to the unreliability of resultant force or global optimization methods due to the complex trade-off on the different aspects. To solve this problem, we proposed a hybrid optimization method by taking advantages of the merits of both approaches. We first detect the spatial-temporal intersection regions from whole trajectories of the features, and initialize the layout by optimization in decreasing order by the number of the involved features. The label movements between the spatial-temporal intersection regions are determined by force directed methods. To cope with some features with high speed relative to neighbors, we introduced a force from future, called temporal force, so that the labels of related features can elude ahead of time and retain smooth movements. We also proposed a strategy by optimizing the label layout to predict the trajectories of features so that such global optimization method can be applied to streaming data.

Open Access Research Article Issue
A new dataset of dog breed images and a benchmark for fine-grained classification
Computational Visual Media 2020, 6 (4): 477-487
Published: 01 October 2020
Abstract PDF (956.9 KB) Collect
Downloads:71

In this paper, we introduce an image dataset for fine-grained classification of dog breeds: the Tsinghua Dogs Dataset. It is currently the largest dataset for fine-grained classification of dogs, including 130 dog breeds and 70,428 real-world images. It has only one dog in each image and provides annotated bounding boxes for the whole body and head. In comparison to previous similar datasets, it contains more breeds and more carefully chosen images for each breed. The diversity within each breed is greater, with between 200 and 7000+ images for each breed. Annotation of the whole body and head makes the dataset not only suitable for the improvement of fine-grained image classification models based on overall features, but also for those locating local informative parts. We show that dataset provides a tough challenge by benchmarking several state-of-the-art deep neural models. The dataset is available for academic purposes at https://cg.cs.tsinghua.edu.cn/ThuDogs/.

Open Access Research Article Issue
What and where: A context-based recommendation system for object insertion
Computational Visual Media 2020, 6 (1): 79-93
Published: 02 April 2020
Abstract PDF (1.3 MB) Collect
Downloads:32

We propose a novel problem revolving around two tasks: (i) given a scene, recommend objects to insert, and (ii) given an object category, retrieve suitable background scenes. A bounding box for the inserted object is predicted in both tasks, which helps downstream applications such as semi-automated advertising and video composition. The major challenge lies in the fact that the target object is neither present nor localized in the input, and furthermore, available datasets only provide scenes with existing objects. To tackle this problem, we build an unsupervised algorithm based on object-level contexts, which explicitly models the joint probability distribution of object categories and bounding boxes using a Gaussian mixture model. Experiments on our own annotated test set demonstrate that our system outperforms existing baselines on all sub-tasks, and does so using a unified framework. Future extensions and applications are suggested.

Survey Issue
A Survey of 3D Indoor Scene Synthesis
Journal of Computer Science and Technology 2019, 34 (3): 594-608
Published: 10 May 2019
Abstract Collect

Indoor scene synthesis has become a popular topic in recent years. Synthesizing functional and plausible indoor scenes is an inherently difficult task since it requires considerable knowledge to both choose reasonable object categories and arrange objects appropriately. In this survey, we propose four criteria which group a wide range of 3D (three-dimensional) indoor scene synthesis techniques according to various aspects (specifically, four groups of categories). It also provides hints, through comprehensively comparing all the techniques to demonstrate their effectiveness and drawbacks, and discussions of potential remaining problems.

Total 15