Spatial-Temporal Sequence Attention Based Efficient Transformer for Video Snow Removal

Tao Gao; Qianxi Zhang; Ting Chen; Yuanbo Wen

doi:10.26599/BDMA.2024.9020061

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

PDF (9.8 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

Spatial-Temporal Sequence Attention Based Efficient Transformer for Video Snow Removal

Tao Gao^¹, Qianxi Zhang^², Ting Chen^³(

), Yuanbo Wen^³

1School of Data Science and Artificial Intelligence, Chang’an University, Xi’an 710064, China

2School of Information Engineering, Chang’an University, Xi’an 710064, China

3School of Information Engineering, Chang’an University, Xi’an 710064, China

Show Author Information

Abstract

Video snow removal has tremendous potential in enhancing video quality and boosting the performance of computer vision tasks. Recently, Transformers have gained attention for the self-attention mechanism. However, the memory consumption of self-attention is considerable, limiting its application in high-resolution video restoration. In this paper, we propose an efficient video desnowing spatio-temporal Transformer, which utilizes spatio-temporal sequence attention to parallelly capture intra-frame spatial information and inter-frame temporal information, with much lower memory consumption compared to standard self-attention. Additionally, we mitigate the impact of snowflake occlusion on video frame alignment by leveraging an atmospheric scattering model. Furthermore, we introduce the concept of Neural Representation for Videos (NeRV) and effectively reconstruct compressed videos after multi-resolution feature extraction using the recovery NeRV module, thereby further reducing computational consumption. Extensive experiments demonstrate that the model achieves superior performance in video snow removal while significantly reducing computational resources.

Keywords

video restoration vision Transformer window attention computer vision neural representation

References

【1】

Crossref Google Scholar

Big Data Mining and Analytics

Volume 8 Issue 3,
June 2025

Pages 551-562

DOI: 10.26599/BDMA.2024.9020061

	{{item.num}}
{{version.versionName}} Author Response
{{version.versionName}} Review comment

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Cite this Report

. . , , {{reviewData.reportCite.doi}}

Cite this article:

Gao T, Zhang Q, Chen T, et al. Spatial-Temporal Sequence Attention Based Efficient Transformer for Video Snow Removal. Big Data Mining and Analytics, 2025, 8(3): 551-562. https://doi.org/10.26599/BDMA.2024.9020061

1129

Views

111

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Received: 24 April 2024

Revised: 01 June 2024

Accepted: 04 September 2024

Published: 04 April 2025

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).