Scholar - SciOpen

This paper presents ScenePalette, a modeling tool that allows users to “draw” 3D scenes interactively by placing objects on a canvas based on their contextual relationship. ScenePalette is inspired by an important intuition which was often ignored in previous work: a real-world 3D scene consists of the contextually reasonable organization of objects, e.g. people typically place one double bed with several subordinate objects into a bedroom instead of different shapes of beds. ScenePalette, abstracts 3D repositories as multiplex networks and accordingly encodes implicit relations between or among objects. Specifically, basic statistics such as co-occurrence, in combination with advanced relations, are used to tackle object relationships of different levels. Extensive experiments demonstrate that the latent space of ScenePalette has rich contexts that are essential for contextual representation and exploration.

Survey Issue

Overcoming Spatial Constraints in VR: A Survey of Redirected Walking Techniques

Jia-Hong Liu, Yang-Fu Ren, Qi Wen Gan, Kui Huang, Fiona Xiao Yu Chen, Er-Xia Luo, Khang Yeu Tang, Yue-Yao Fu, Cheng-Wei Fan, Sen-Zhe Xu, Song-Hai Zhang

Journal of Computer Science and Technology 2024, 39(4): 841-870

Published: 20 September 2024

Abstract Collect Collected

As the virtual reality (VR) technology strives to provide immersive and natural user experiences, the challenge of aligning vast virtual environments with limited physical spaces remains significant. This survey comprehensively explores the advancements in redirected walking (RDW) techniques aimed at overcoming spatial constraints in VR. RDW addresses this by subtly manipulating users’ physical movements to allow for seamless navigation within constrained areas. The survey delves into gain perception mechanisms, detailing how slight discrepancies between virtual and real-world movements can be utilized without user awareness, thus extending the effective navigable space. Various RDW control algorithms for gain-based RDW are analyzed, highlighting their implementation and effectiveness in maintaining immersion and minimizing perceptual disturbances. Furthermore, novel methods extending beyond traditional gain-based techniques are discussed, showcasing innovative approaches that further refine VR interactions. The practical implications of RDW in enhancing safety and reducing physical collisions in VR environments are underscored, alongside its potential to improve user experience by aligning virtual exploration more closely with natural human behavior patterns. Through a thorough review of existing literature and recent advancements, this survey provides a systematic understanding for researchers, developers, and industry professionals. It underscores the importance of RDW in the future of VR, emphasizing RDW's role in making VR more accessible and practical across various applications, from education and training to therapy and entertainment. The paper concludes with a forward-looking perspective on the continued evolution and potential of RDW in revolutionizing virtual reality experiences.

Open Access Research Article Issue

AdaPIP: Adaptive picture-in-picture guidance for 360° film watching

Yi-Xiao Li, Guan Luo, Yi-Ke Xu, Yu He, Fang-Lue Zhang, Song-Hai Zhang

Computational Visual Media 2024, 10(3): 487-503

Published: 02 May 2024

Abstract

PDF (4.9 MB) Collect Collected

Downloads：30

360 $^{\circ}$ videos enable viewers to watch freelyfrom different directions but inevitably prevent them from perceiving all the helpful information. To mitigate this problem, picture-in-picture (PIP) guidance was proposed using preview windows to show regions of interest (ROIs) outside the current view range. We identify several drawbacks of this representation and propose a new method for 360 $^{\circ}$ film watching called AdaPIP. AdaPIP enhances traditional PIP by adaptively arranging preview windows with changeable view ranges and sizes. In addition, AdaPIP incorporates the advantage of arrow-based guidance by presenting circular windows with arrows attached to them to help users locate the corresponding ROIs more efficiently. We also adapted AdaPIP and Outside-In to HMD-based immersive virtual reality environments to demonstrate the usability of PIP-guided approaches beyond 2D screens. Comprehensive user experiments on 2D screens, as well as in VR environments, indicate that AdaPIP is superior to alternative methods in terms of visual experiences while maintaining a comparable degree of immersion.

Regular Paper Issue

Learning Local Contrast for Crisp Edge Detection

Xiao-Nan Fang, Song-Hai Zhang

Journal of Computer Science and Technology 2023, 38(3): 554-566

Published: 30 May 2023

Abstract Collect Collected

In recent years, the accuracy of edge detection on several benchmarks has been significantly improved by deep learning based methods. However, the prediction of deep neural networks is usually blurry and needs further post-processing including non-maximum suppression and morphological thinning. In this paper, we demonstrate that the blurry effect arises from the binary cross-entropy loss, and crisp edges could be obtained directly from deep convolutional neural networks. We propose to learn edge maps as the representation of local contrast with a novel local contrast loss. The local contrast is optimized in a stochastic way to focus on specific edge directions. Experiments show that the edge detection network trained with local contrast loss achieves a high accuracy comparable to previous methods and dramatically improves the crispness. We also present several applications of the crisp edges, including image completion, image retrieval, sketch generation, and video stylization.

Open Access Research Article Issue

Focusing on your subject: Deep subject-aware image composition recommendation networks

Guo-Ye Yang, Wen-Yang Zhou, Yun Cai, Song-Hai Zhang, Fang-Lue Zhang

Computational Visual Media 2023, 9(1): 87-107

Published: 18 October 2022

Abstract

PDF (11 MB) Collect Collected

Downloads：59

Photo composition is one of the most important factors in the aesthetics of photographs. As a popular application, composition recommendation for a photo focusing on a specific subject has been ignored by recent deep-learning-based composition recommendation approaches. In this paper, we propose a subject-aware image composition recommendation method, SAC-Net, which takes an RGB image and a binary subject window mask as input, and returns good compositions as crops containing the subject. Our model first determines candidate scores for all possible coarse cropping windows. The crops with high candidate scores are selected and further refined by regressing their corner points to generate the output recommended cropping windows. The final scores of the refined crops are predicted by a final score regression module. Unlike existing methods that need to preset several cropping windows, our network is able to automatically regress cropping windows with arbitrary aspect ratios and sizes. We propose novel stability losses for maximizing smoothness when changing cropping windows along with view changes. Experimental results show that our method outperforms state-of-the-art methods not only on the subject-aware image composition recommendation task, but also for general purpose composition recommendation. We also have designed a multi-stage labeling scheme so that a large amount ofranked pairs can be produced economically. Weuse this scheme to propose the first subject-aware composition dataset SACD, which contains 2777 images, and more than 5 million composition ranked pairs. The SACD dataset is publicly available at https://cg.cs.tsinghua.edu.cn/SACD/.

Regular Paper Issue

Local Homography Estimation on User-Specified Textureless Regions

Zheng Chen, Xiao-Nan Fang, Song-Hai Zhang

Journal of Computer Science and Technology 2022, 37(3): 615-625

Published: 31 May 2022

Abstract Collect Collected

This paper presents a novel deep neural network for designated point tracking (DPT) in a monocular RGB video, VideoInNet. More concretely, the aim is to track four designated points correlated by a local homography on a textureless planar region in the scene. DPT can be applied to augmented reality and video editing, especially in the field of video advertising. Existing methods predict the location of four designated points without appropriately considering the point correlation. To solve this problem, VideoInNet predicts the motion of the four designated points correlated by a local homography within the heatmap prediction framework. Our network refines the heatmaps of designated points through two stages. On the first stage, we introduce a context-aware and location-aware structure to learn a local homography for the designated plane in a supervised way. On the second stage, we introduce an iterative heatmap refinement module to improve the tracking accuracy. We propose a dataset focusing on textureless planar regions, named ScanDPT, for training and evaluation. We show that the error rate of VideoInNet is about 29% lower than that of the state-of-the-art approach when testing in the first 120 frames of testing videos on ScanDPT.

Open Access Review Article Issue

Attention mechanisms in computer vision: A survey

Meng-Hao Guo, Tian-Xing Xu, Jiang-Jiang Liu, Zheng-Ning Liu, Peng-Tao Jiang, Tai-Jiang Mu, Song-Hai Zhang, Ralph R. Martin, Ming-Ming Cheng, Shi-Min Hu

Computational Visual Media 2022, 8(3): 331-368

Published: 15 March 2022

Abstract

PDF (2.7 MB) Collect Collected

Downloads：457

Humans can naturally and effectively find salient regions in complex scenes. Motivated by thisobservation, attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image. Attention mechanisms have achieved great success in many visual tasks, including image classification, object detection, semantic segmentation, video understanding, image generation, 3D vision, multi-modal tasks, and self-supervised learning. In this survey, we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention, and branch attention; a related repository https://github.com/MenghaoGuo/Awesome-Vision-Attentions is dedicated to collecting related work. We also suggest future directions for attention mechanism research.

Open Access Review Article Issue

Deep image synthesis from intuitive user input: A review and perspectives

Yuan Xue, Yuan-Chen Guo, Han Zhang, Tao Xu, Song-Hai Zhang, Xiaolei Huang

Computational Visual Media 2022, 8(1): 3-31

Published: 27 October 2021

Abstract

PDF (4.6 MB) Collect Collected

Downloads：29

In many applications of computer graphics, art, and design, it is desirable for a user to provide intuitive non-image input, such as text, sketch, stroke, graph, or layout, and have a computer system automatically generate photo-realistic images according to that input. While classically, works that allow such automatic image content generation have followed a framework of image retrieval and composition, recent advances in deep generative models such as generative adversarial networks (GANs), variational autoencoders (VAEs), and flow-based methods have enabled more powerful and versatile image generation approaches. This paper reviews recent works for image synthesis given intuitive user input, covering advances in input versatility, image generation methodology, benchmark datasets, and evaluation metrics. This motivates new perspectives on input representation and interactivity, cross fertilization between major image generation paradigms, and evaluation and comparison of generation methods.

Open Access Research Article Issue

Smoothness preserving layout for dynamic labels by hybrid optimization

Yu He, Guo-Dong Zhao, Song-Hai Zhang

Computational Visual Media 2022, 8(1): 149-163

Published: 27 October 2021

Abstract

PDF (2.1 MB) Collect Collected

Downloads：15

Stable label movement and smooth label trajectory are critical for effective information understanding. Sudden label changes cannot be avoided by whatever forced directed methods due to the unreliability of resultant force or global optimization methods due to the complex trade-off on the different aspects. To solve this problem, we proposed a hybrid optimization method by taking advantages of the merits of both approaches. We first detect the spatial-temporal intersection regions from whole trajectories of the features, and initialize the layout by optimization in decreasing order by the number of the involved features. The label movements between the spatial-temporal intersection regions are determined by force directed methods. To cope with some features with high speed relative to neighbors, we introduced a force from future, called temporal force, so that the labels of related features can elude ahead of time and retain smooth movements. We also proposed a strategy by optimizing the label layout to predict the trajectories of features so that such global optimization method can be applied to streaming data.

Open Access Research Article Issue

A new dataset of dog breed images and a benchmark for fine-grained classification

Ding-Nan Zou, Song-Hai Zhang, Tai-Jiang Mu, Min Zhang

Computational Visual Media 2020, 6(4): 477-487

Published: 01 October 2020

Abstract

PDF (956.9 KB) Collect Collected

Downloads：86

In this paper, we introduce an image dataset for fine-grained classification of dog breeds: the Tsinghua Dogs Dataset. It is currently the largest dataset for fine-grained classification of dogs, including 130 dog breeds and 70,428 real-world images. It has only one dog in each image and provides annotated bounding boxes for the whole body and head. In comparison to previous similar datasets, it contains more breeds and more carefully chosen images for each breed. The diversity within each breed is greater, with between 200 and 7000+ images for each breed. Annotation of the whole body and head makes the dataset not only suitable for the improvement of fine-grained image classification models based on overall features, but also for those locating local informative parts. We show that dataset provides a tough challenge by benchmarking several state-of-the-art deep neural models. The dataset is available for academic purposes at https://cg.cs.tsinghua.edu.cn/ThuDogs/.

Total 17

<1/212 >GOpage