Transformers in computational visual media: A survey

Yifan Xu; Huapeng Wei; Minxuan Lin; Yingying Deng; Kekai Sheng; Mengdan Zhang; Fan Tang; Weiming Dong; Feiyue Huang; Changsheng Xu

doi:10.1007/s41095-021-0247-3

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

PDF (5.2 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Review Article | Open Access

Transformers in computational visual media: A survey

Yifan Xu^{¹^,²}, Huapeng Wei^³, Minxuan Lin^{¹^,²}, Yingying Deng^{¹^,²}, Kekai Sheng^⁴, Mengdan Zhang^⁴, Fan Tang^³, Weiming Dong^{¹^,²^,⁵}(

), Feiyue Huang^⁴, Changsheng Xu^{¹^,²^,⁵}

1NLPR, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

2School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100040, China

3School of Artificial Intelligence, Jilin University, Changchun 130012, China

4Youtu Lab, Tencent Inc., Shanghai 200233, China

5CASIA-LLVISION Joint Lab, Beijing 100190, China

Show Author Information

Abstract

Transformers, the dominant architecture for natural language processing, have also recently attracted much attention from computational visual media researchers due to their capacity for long-range representation and high performance. Transformers are sequence-to-sequence models, which use a self-attention mechanism rather than the RNN sequential structure. Thus, such models can be trained in parallel and can represent global information. This study comprehensively surveys recent visual transformer works. We categorize them according to task scenario: backbone design, high-level vision, low-level vision and generation, and multimodal learning. Their key ideas are also analyzed. Differing from previous surveys, we mainly focus on visual transformer methods in low-level vision and generation. The latest works on backbone design are also reviewed in detail. For ease of understanding, we precisely describe the main contributions of the latest works in the form of tables. As well as giving quantitative comparisons, we also present image results for low-level vision and generation tasks. Computational costs and source code links for various important works are also given in this survey to assist further development.

Keywords

visual transformer computational visual media (CVM)high-level vision low-level vision image generation multi-modal learning

References

【1】

Crossref Google Scholar

Computational Visual Media

Volume 8 Issue 1,
March 2022

Pages 33-62

DOI: 10.1007/s41095-021-0247-3

	{{item.num}}
{{version.versionName}} Author Response
{{version.versionName}} Review comment

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Cite this Report

. . , , {{reviewData.reportCite.doi}}

Cite this article:

Xu Y, Wei H, Lin M, et al. Transformers in computational visual media: A survey. Computational Visual Media, 2022, 8(1): 33-62. https://doi.org/10.1007/s41095-021-0247-3

2345

Views

145

Downloads

121

Crossref

111

Web of Science

134

Scopus

CSCD

Google Scholar
Citation

Received: 17 June 2021

Accepted: 16 July 2021

Published: 27 October 2021

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www. editorialmanager.com/cvmj.