Scholar - SciOpen

Masked autoencoders (MAEs) have recently achieved great success in computer vision. They can automatically extract representations from unlabeled data and improve the performance of various downstream tasks. However, training an MAE model requires substantial resources, which limits their accessibility to many academic institutions: often laboratories in universities lack the necessary resources. This issue significantly hinders the development of this field. In this paper, we propose FastMAE, an efficient MAE approach. Inspired by the idea of offline tokenizers in natural language processing, FastMAE presents a novel way to build an offline vision tokenizer, which can provide high-level semantics in an efficient way. Benefiting from the offline tokenizer, FastMAE becomes an efficient vision learner. Our experiments demonstrate that FastMAE can achieve 83.6% accuracy with ViT-B in only 18.8 h on 8 NVIDIA Tesla-V100 GPUs, which is 31.3× faster than the original MAE, providing a resource friendly baseline for the computer vision community. Moreover, it also achieves comparable performance to state-of-the-art methods. We hope our research will attract more people to engage in MAE-related research and that we can advance its development together.

Open Access Review Article Issue

Diffusion models for 3D generation: A survey

Chen Wang, Hao-Yang Peng, Ying-Tian Liu, Jiatao Gu, Shi-Min Hu

Computational Visual Media 2025, 11(1): 1-28

Published: 28 February 2025

Abstract

PDF (11.9 MB) Collect Collected

Downloads：1357

Denoising diffusion models have demonstrated tremendous success in modeling data distributions and synthesizing high-quality samples. In the 2D image domain, they have become the state-of-the-art and are capable of generating photo-realistic images with high controllability. More recently, researchers have begun to explore how to utilize diffusion models to generate 3D data, as doing so has more potential in real-world applications. This requires careful design choices in two key ways: identifying a suitable 3D representation and determining how to apply the diffusion process. In this survey, we provide the first comprehensive review of diffusion models for manipulating 3D content, including 3D generation, reconstruction, and 3D-aware image synthesis. We classify existing methods into three major categories: 2D space diffusion with pretrained models, 2D space diffusion without pretrained models, and 3D space diffusion. We also summarize popular datasets used for 3D generation with diffusion models. Along with this survey, we maintain a repository https://github.com/cwchenwang/awesome-3d-diffusion to track the latest relevant papers and codebases. Finally, we pose current challenges for diffusion models for 3D generation, and suggest future research directions.

Total 2