Scholar - SciOpen

Deep learning has been successfully used for tasks in the 2D image domain. Research on 3D computer vision and deep geometry learning has also attracted attention. Considerable achievements have been made regarding feature extraction and discrimination of 3D shapes. Following recent advances in deep generative models such as generative adversarial networks, effective generation of 3D shapes has become an active research topic. Unlike 2D images with a regular grid structure, 3D shapes have various representations, such as voxels, point clouds, meshes, and implicit functions. For deep learning of 3D shapes, shape representation has to be taken into account as there is no unified representation that can cover all tasks well. Factors such as the representativeness of geometry and topology often largely affect the quality of the generated 3D shapes. In this survey, we comprehensively review works on deep-learning-based 3D shape generation by classifying and discussing them in terms of the underlying shape representation and the architecture of the shape generator. The advantages and disadvantages of each class are further analyzed. We also consider the 3D shape datasets commonly used for shape generation. Finally, we present several potential research directions that hopefully can inspire future works on this topic.

Open Access Review Article Issue

Attention mechanisms in computer vision: A survey

Meng-Hao Guo, Tian-Xing Xu, Jiang-Jiang Liu, Zheng-Ning Liu, Peng-Tao Jiang, Tai-Jiang Mu, Song-Hai Zhang, Ralph R. Martin, Ming-Ming Cheng, Shi-Min Hu

Computational Visual Media 2022, 8 (3): 331-368

Published: 15 March 2022

Abstract

PDF (2.7 MB) Collect Collected

Downloads：287

Humans can naturally and effectively find salient regions in complex scenes. Motivated by thisobservation, attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image. Attention mechanisms have achieved great success in many visual tasks, including image classification, object detection, semantic segmentation, video understanding, image generation, 3D vision, multi-modal tasks, and self-supervised learning. In this survey, we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention, and branch attention; a related repository https://github.com/MenghaoGuo/Awesome-Vision-Attentions is dedicated to collecting related work. We also suggest future directions for attention mechanism research.

Open Access Research Article Issue

HDR-Net-Fusion: Real-time 3D dynamic scene reconstruction with a hierarchical deep reinforcement network

Hao-Xuan Song, Jiahui Huang, Yan-Pei Cao, Tai-Jiang Mu

Computational Visual Media 2021, 7 (4): 419-435

Published: 05 August 2021

Abstract

PDF (21.1 MB) Collect Collected

Downloads：40

Reconstructing dynamic scenes with commodity depth cameras has many applications in computer graphics, computer vision, and robotics. However, due to the presence of noise and erroneous observations from data capturing devices and the inherently ill-posed nature of non-rigid registration with insufficient information, traditional approaches often produce low-quality geometry with holes, bumps, and misalignments. We propose a novel 3D dynamic reconstruction system, named HDR-Net-Fusion, which learns to simultaneously reconstruct and refine the geometry on the fly with a sparse embedded deformation graph of surfels, using a hierarchical deep reinforcement (HDR) network. The latter comprises two parts: a global HDR-Net which rapidly detects local regions with large geometric errors, and a local HDR-Net serving as a local patch refinement operator to promptly complete and enhance such regions. Training the global HDR-Net is formulated as a novel reinforcement learning problem to implicitly learn the region selection strategy with the goal of improving the overall reconstruction quality. The applicability and efficiency of our approach are demonstrated using a large-scale dynamic reconstruction dataset. Our method can reconstruct geometry with higher quality than traditional methods.

Open Access Short Communication Issue

Can attention enable MLPs to catch up with CNNs?

Meng-Hao Guo, Zheng-Ning Liu, Tai-Jiang Mu, Dun Liang, Ralph R. Martin, Shi-Min Hu

Computational Visual Media 2021, 7 (3): 283-288

Published: 27 July 2021

Abstract

PDF (3 MB) Collect Collected

Downloads：47

Open Access Research Article Issue

PCT: Point cloud transformer

Meng-Hao Guo, Jun-Xiong Cai, Zheng-Ning Liu, Tai-Jiang Mu, Ralph R. Martin, Shi-Min Hu

Computational Visual Media 2021, 7 (2): 187-199

Published: 10 April 2021

Abstract

PDF (10.7 MB) Collect Collected

Downloads：188

The irregular domain and lack of ordering make it challenging to design deep neural networks for point cloud processing. This paper presents a novel framework named Point Cloud Transformer (PCT) for point cloud learning. PCT is based on Transformer,which achieves huge success in natural language processingand displays great potential in image processing. It is inherently permutation invariant for processing a sequence of points, making it well-suited for point cloud learning. To better capture local context within the point cloud, we enhance input embedding with the support of farthest point sampling and nearest neighbor search. Extensive experiments demonstrate that the PCT achieves the state-of-the-art performance on shape classification, part segmentation, semantic segmentation, and normal estimation tasks.

Open Access Research Article Issue

Detecting human-object interaction with multi-level pairwise feature network

Hanchao Liu, Tai-Jiang Mu, Xiaolei Huang

Computational Visual Media 2021, 7 (2): 229-239

Published: 19 October 2020

Abstract

PDF (6.1 MB) Collect Collected

Downloads：71

Human-object interaction (HOI) detection is crucial for human-centric image understanding which aims to infer $⟨$ human, action, object $⟩$ triplets within an image. Recent studies often exploit visual features and the spatial configuration of a human-object pair in order to learn the action linking the human and object in the pair. We argue that such a paradigm of pairwise feature extraction and action inference can be applied not only at the whole human and object instance level, but also at the part level at which a body part interacts with an object, and at the semantic level by considering the semantic label of an object along with human appearance and human-object spatial configuration, to infer the action. We thus propose a multi-levelpairwise feature network (PFNet) for detecting human-object interactions. The network consists of threeparallel streams to characterize HOI utilizing pairwise features at the above three levels; the three streams are finally fused to give the action prediction. Extensive experiments show that our proposed PFNet outperforms other state-of-the-art methods on the V-COCO dataset and achieves comparable results to the state-of-the-art on the HICO-DET dataset.

Open Access Research Article Issue

A new dataset of dog breed images and a benchmark for fine-grained classification

Ding-Nan Zou, Song-Hai Zhang, Tai-Jiang Mu, Min Zhang

Computational Visual Media 2020, 6 (4): 477-487

Published: 01 October 2020

Abstract

PDF (956.9 KB) Collect Collected

Downloads：61

In this paper, we introduce an image dataset for fine-grained classification of dog breeds: the Tsinghua Dogs Dataset. It is currently the largest dataset for fine-grained classification of dogs, including 130 dog breeds and 70,428 real-world images. It has only one dog in each image and provides annotated bounding boxes for the whole body and head. In comparison to previous similar datasets, it contains more breeds and more carefully chosen images for each breed. The diversity within each breed is greater, with between 200 and 7000+ images for each breed. Annotation of the whole body and head makes the dataset not only suitable for the improvement of fine-grained image classification models based on overall features, but also for those locating local informative parts. We show that dataset provides a tough challenge by benchmarking several state-of-the-art deep neural models. The dataset is available for academic purposes at https://cg.cs.tsinghua.edu.cn/ThuDogs/.

Survey Issue

Lane Detection: A Survey with New Results

Dun Liang, Yuan-Chen Guo, Shao-Kui Zhang, Tai-Jiang Mu, Xiaolei Huang

Journal of Computer Science and Technology 2020, 35 (3): 493-505

Published: 29 May 2020

Abstract Collect Collected

Lane detection is essential for many aspects of autonomous driving, such as lane-based navigation and high-definition (HD) map modeling. Although lane detection is challenging especially with complex road conditions, considerable progress has been witnessed in this area in the past several years. In this survey, we review recent visual-based lane detection datasets and methods. For datasets, we categorize them by annotations, provide detailed descriptions for each category, and show comparisons among them. For methods, we focus on methods based on deep learning and organize them in terms of their detection targets. Moreover, we introduce a new dataset with more detailed annotations for HD map modeling, a new direction for lane detection that is applicable to autonomous driving in complex road conditions, a deep neural network LineNet for lane detection, and show its application to HD map modeling.

Total 8