Sort:
Open Access Review Article Issue
Diffusion models for 3D generation: A survey
Computational Visual Media 2025, 11(1): 1-28
Published: 28 February 2025
Abstract PDF (11.9 MB) Collect
Downloads:33

Denoising diffusion models have demonstrated tremendous success in modeling data distributions and synthesizing high-quality samples. In the 2D image domain, they have become the state-of-the-art and are capable of generating photo-realistic images with high controllability. More recently, researchers have begun to explore how to utilize diffusion models to generate 3D data, as doing so has more potential in real-world applications. This requires careful design choices in two key ways: identifying a suitable 3D representation and determining how to apply the diffusion process. In this survey, we provide the first comprehensive review of diffusion models for manipulating 3D content, including 3D generation, reconstruction, and 3D-aware image synthesis. We classify existing methods into three major categories: 2D space diffusion with pretrained models, 2D space diffusion without pretrained models, and 3D space diffusion. We also summarize popular datasets used for 3D generation with diffusion models. Along with this survey, we maintain a repository https://github.com/cwchenwang/awesome-3d-diffusion to track the latest relevant papers and codebases. Finally, we pose current challenges for diffusion models for 3D generation, and suggest future research directions.

Open Access Editorial Issue
Message from the Editor-in-Chief
Computational Visual Media 2024, 10(1): 1
Published: 30 November 2023
Abstract PDF (967.3 KB) Collect
Downloads:14
Open Access Research Article Issue
Visual attention network
Computational Visual Media 2023, 9(4): 733-752
Published: 28 July 2023
Abstract PDF (5.4 MB) Collect
Downloads:74

While originally designed for natural language processing tasks, the self-attention mechanism has recently taken various computer vision areas by storm. However, the 2D nature of images brings three challenges for applying self-attention in computer vision: (1) treating images as 1D sequences neglects their 2D structures; (2) the quadratic complexity is too expensive for high-resolution images; (3) it only captures spatial adaptability but ignores channel adaptability. In this paper, we propose a novel linear attention named large kernel attention (LKA) to enable self-adaptive and long-range correlations in self-attention while avoiding its shortcomings. Furthermore, we present a neural network based on LKA, namely Visual Attention Network (VAN). While extremely simple, VAN achieves comparable results with similar size convolutional neuralnetworks (CNNs) and vision transformers (ViTs) in various tasks, including image classification, object detection, semantic segmentation, panoptic segmentation,pose estimation, etc. For example, VAN-B6 achieves 87.8% accuracy on ImageNet benchmark, and sets new state-of-the-art performance (58.2% PQ) for panoptic segmentation. Besides, VAN-B2 surpasses Swin-T 4% mIoU (50.1% vs. 46.1%) for semantic segmentation on ADE20K benchmark, 2.6% AP (48.8% vs. 46.2%) for object detection on COCO dataset. It provides a novel method and a simple yet strong baseline for the community. The code is available at https://github.com/Visual-Attention-Network.

Open Access Editorial Issue
Message from the Editor-in-Chief
Computational Visual Media 2023, 9(1): 1
Published: 18 October 2022
Abstract PDF (916.6 KB) Collect
Downloads:49
Regular Paper Issue
Preface
Journal of Computer Science and Technology 2022, 37(3): 559-560
Published: 31 May 2022
Abstract Collect
Open Access Review Article Issue
Attention mechanisms in computer vision: A survey
Computational Visual Media 2022, 8(3): 331-368
Published: 15 March 2022
Abstract PDF (2.7 MB) Collect
Downloads:454

Humans can naturally and effectively find salient regions in complex scenes. Motivated by thisobservation, attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image. Attention mechanisms have achieved great success in many visual tasks, including image classification, object detection, semantic segmentation, video understanding, image generation, 3D vision, multi-modal tasks, and self-supervised learning. In this survey, we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention, and branch attention; a related repository https://github.com/MenghaoGuo/Awesome-Vision-Attentions is dedicated to collecting related work. We also suggest future directions for attention mechanism research.

Open Access Editorial Issue
Message from the Editor-in-Chief
Computational Visual Media 2022, 8(1): 1
Published: 27 October 2021
Abstract PDF (128 KB) Collect
Downloads:32

Open Access Short Communication Issue
Can attention enable MLPs to catch up with CNNs?
Computational Visual Media 2021, 7(3): 283-288
Published: 27 July 2021
Abstract PDF (3 MB) Collect
Downloads:50

Editorial Issue
Preface
Journal of Computer Science and Technology 2021, 36(3): 463-464
Published: 05 May 2021
Abstract Collect
Open Access Research Article Issue
PCT: Point cloud transformer
Computational Visual Media 2021, 7(2): 187-199
Published: 10 April 2021
Abstract PDF (10.7 MB) Collect
Downloads:220

The irregular domain and lack of ordering make it challenging to design deep neural networks for point cloud processing. This paper presents a novel framework named Point Cloud Transformer (PCT) for point cloud learning. PCT is based on Transformer,which achieves huge success in natural language processingand displays great potential in image processing. It is inherently permutation invariant for processing a sequence of points, making it well-suited for point cloud learning. To better capture local context within the point cloud, we enhance input embedding with the support of farthest point sampling and nearest neighbor search. Extensive experiments demonstrate that the PCT achieves the state-of-the-art performance on shape classification, part segmentation, semantic segmentation, and normal estimation tasks.

Total 23