iSeeBetter: Spatio-temporal video super-resolution using recurrent generative back-projection networks

Aman Chadha; John Britto; M. Mani Roja

doi:10.1007/s41095-020-0175-7

Computational Visual Media 2020, 6(3): 307-317 https://doi.org/10.1007/s41095-020-0175-7

Research Article |

Open Access | Issue | Published: 20 July 2020

iSeeBetter: Spatio-temporal video super-resolution using recurrent generative back-projection networks

Show Author's Information Hide Author's Information Aman Chadha^¹(

), John Britto^², M. Mani Roja^³

1 Department of Computer Science, Stanford University, 450 Serra Mall, Stanford, CA 94305, USA

2 Department of Computer Science, University of Massachusetts Amherst, Amherst, MA 01003, USA

3 Department of Electronics and Telecommunication Engineering, University of Mumbai, Mumbai, Maharashtra 400032, India

Keywords:

super resolution, video upscaling, frame recurrence, optical flow, generative adver-sarial networks, convolutional neural networks

Cite this article:

Chadha A, Britto J, Roja MM. iSeeBetter: Spatio-temporal video super-resolution using recurrent generative back-projection networks. Computational Visual Media, 2020, 6(3): 307-317. https://doi.org/10.1007/s41095-020-0175-7

Download citation

EndNote(RIS)

BibTeX

801

Views

Downloads

Citations

Crossref

N/A

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

Recently, learning-based models have enhanced the performance of single-image super-resolution (SISR). However, applying SISR successively to each video frame leads to a lack of temporal coherency. Convolutional neural networks (CNNs) outperform traditional approaches in terms of image quality metrics such as peak signal to noise ratio (PSNR) and structuralsimilarity (SSIM). On the other hand, generative adversarial networks (GANs) offer a competitive advantage by being able to mitigate the issue of a lack of finer texture details, usually seen with CNNs when super-resolving at large upscaling factors. We present iSeeBetter, a novel GAN-based spatio-temporal approach to video super-resolution (VSR) that renders temporally consistent super-resolution videos. iSeeBetter extracts spatial and temporal information from the current and neighboring frames using the concept of recurrent back-projection networks as its generator. Furthermore, to improve the "naturality" of the super-resolved output while eliminating artifacts seen with traditional algorithms, we utilize the discriminator from super-resolution generative adversarial network. Although mean squared error (MSE) as a primary loss-minimization objective improves PSNR/SSIM, these metrics may not capture fine details in the image resulting in misrepresentation of perceptual quality. To address this, we use a four-fold (MSE, perceptual, adversarial, and total-variation loss function. Our results demonstrate that iSeeBetter offers superior VSR fidelity and surpasses state-of-the-art performance.

Full text

Abstract

Full text

Outline

About this article

iSeeBetter: Spatio-temporal video super-resolution using recurrent generative back-projection networks

Show Author's information Hide Author's Information Aman Chadha^¹(

), John Britto^², M. Mani Roja^³

1 Department of Computer Science, Stanford University, 450 Serra Mall, Stanford, CA 94305, USA

2 Department of Computer Science, University of Massachusetts Amherst, Amherst, MA 01003, USA

3 Department of Electronics and Telecommunication Engineering, University of Mumbai, Mumbai, Maharashtra 400032, India

Abstract

Keywords: super resolution, video upscaling, frame recurrence, optical flow, generative adver-sarial networks, convolutional neural networks

References(50)

[1]

C. Dong,; C. C. Loy,; K. M. He,; X. O. Tang, Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 38, No. 2, 295-307, 2016.

DOI Google Scholar

[2]

M. Haris,; G. Shakhnarovich,; N. Ukita, Deep back-projection networks for super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1664-1673, 2018.

DOI

[3]

M. Haris,; M. R. Widyanto,; H. Nobuhara, Inception learning super-resolution. Applied Optics Vol. 56, No. 22, 6043, 2017.

DOI Google Scholar

[4]

J. Kim,; J. K. Lee,; K. M. Lee, Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1646-1654, 2016.

DOI

[5]

E. Faramarzi,; D. Rajan,; M. P. Christensen, Unified blind method for multi-image super-resolution and single/multi-image blur deconvolution. IEEE Transactions on Image Processing Vol. 22, No. 6, 2101-2114, 2013.

DOI Google Scholar

[6]

D. C. Garcia,; C. Dorea,; R. L. de Queiroz, Super resolution for multiview images using depth information. IEEE Transactions on Circuits and Systems for Video Technology Vol. 22, No. 9, 1249-1256, 2012.

DOI Google Scholar

[7]

J. Caballero,; C. Ledig,; A. Aitken,; A. Acosta,; J. Totz,; Z. H. Wang,; W. Shi, Real-time video super-resolution with spatio-temporal networks and motion compensation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4778-4787, 2017.

DOI

[8]

X. Tao,; H. Y. Gao,; R. J. Liao,; J. Wang,; J. Y. Jia, Detail-revealing deep video super-resolution. In: Proceedings of the IEEE International Conference on Computer Vision, 4472-4480, 2017.

DOI

[9]

M. S. M. Sajjadi,; R. Vemulapalli,; M. Brown, Frame-recurrent video super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6626-6634, 2018.

DOI

[10]

M. Haris,; G. Shakhnarovich,; N. Ukita, Recurrent back-projection network for video super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3897-3906, 2019.

DOI

[11]

Y. Jo,; S. W. Oh,; J. Kang,; S. J. Kim, Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3224-3232, 2018.

DOI

[12]

W. Z. Shi,; J. Caballero,; F. Huszar,; J. Totz,; A. P. Aitken,; R. Bishop,; D. Rueckert,; Z. Wang, Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1874-1883, 2016.

DOI

[13]

Y. Huang,; W. Wang,; L. Wang, Bidirectional recurrent convolutional networks for multi-frame super-resolution. In: Proceedings of the Advances in Neural Information Processing Systems 28, 235-243, 2015.

[14]

D. Liu,; Z. W. Wang,; Y. C. Fan,; X. M. Liu,; Z. Y. Wang,; S. Y. Chang,; T. Huang, Robust video super-resolution with learned temporal dynamics. In: Proceedings of the IEEE International Conference on Computer Vision, 2507-2515, 2017.

DOI

[15]

R. J. Liao,; X. Tao,; R. Y. Li,; Z. Y. Ma,; J. Y. Jia, Video super-resolution via deep draft-ensemble learning. In: Proceedings of the IEEE International Conference on Computer Vision, 531-539, 2015.

DOI

[16]

F. A. Gers,; J. Schmidhuber,; F. Cummins, Learning to forget: Continual prediction with LSTM. Neural Computation Vol. 12, No. 10, 2451-2471, 2000.

DOI Google Scholar

[17]

O. Makansi,; E. Ilg,; T. Brox, End-to-end learning of video super-resolution with motion compensation. In: Pattern Recognition. Lecture Notes in Computer Science, Vol. 10496. V. Roth,; T. Vetter, Eds. Springer Cham, 203-214, 2017.

DOI

[18]

M. Irani,; S. Peleg, Improving resolution by image registration. CVGIP: Graphical Models and Image Processing Vol. 53, No. 3, 231-239, 1991.

DOI Google Scholar

[19]

M. Irani,; S. Peleg, Motion analysis for image enhancement: Resolution, occlusion, and transparency. Journal of Visual Communication and Image Representation Vol. 4, No. 4, 324-335, 1993.

DOI Google Scholar

[20]

C. Ledig,; L. Theis,; F. Huszar,; J. Caballero,; A. Cunningham,; A. Acosta,; A. Aitken,; A. Tejani,; J. Totz,; Z. et al. Wang, Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4681-4690, 2017.

DOI

[21]

Ren, H.; Fang, X. Recurrent back-projection network for video super-resolution. In: Final Project for MIT 6.819 Advances in Computer Vision, 1-6, 2018.

DOI

[22]

Z. H. Wang,; J. Chen,; S. C. H. Hoi, Deep learning for image super-resolution: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence DOI: 10.1109/TPAMI.2020.2982166, 2020.

Google Scholar

[23]

M. Mathieu,; C. Couprie,; Y. LeCun, Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440, 2015.

[24]

J. Johnson,; A. Alahi,; F. F. Li, Perceptual losses for real-time style transfer and super-resolution. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9906. B. Leibe,; J. Matas,; N. Sebe,; M. Welling, Eds. Springer Cham, 694-711, 2016.

DOI

[25]

A. Dosovitskiy,; T. Brox, Generating images with perceptual similarity metrics based on deep networks. In: Proceedings of the Advances in Neural Information Processing Systems 29, 658-666, 2016.

DOI

[26]

J. Bruna,; P. Sprechmann,; Y. LeCun, Super-resolution with deep convolutional sufficient statistics. In: Proceedings of the 4th International Conference on Learning Representations, 2016.

DOI

[27]

T. F. Xue,; B. A. Chen,; J. J. Wu,; D. L. Wei,; W. T. Freeman, Video enhancement with task-oriented flow. International Journal of Computer Vision Vol. 127, No. 8, 1106-1125, 2019.

DOI Google Scholar

[28]

C. Liu,; D. Q. Sun, A Bayesian approach to adaptive video super resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 209-216, 2011.

DOI

[29]

R. Tsai, Multiframe image restoration and registration. Advance Computer Visual and Image Processing Vol. 1, 317-339, 1984.

DOI Google Scholar

[30]

J. C. Yang,; T. Huang, Image super-resolution: Historical overview and future challenges. In: Super-Resolution Imaging. P. Milanfar, Ed. CRC Press, 1-34, 2017.

DOI

[31]

Y. Tai,; J. Yang,; X. M. Liu, Image super-resolution via deep recursive residual network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3147-3155, 2017.

DOI

[32]

J. Kim,; J. K. Lee,; K. M. Lee, Deeply-recursive convolutional network for image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1637-1645, 2016.

DOI

[33]

W. S. Lai,; J. B. Huang,; N. Ahuja,; M. H. Yang, Deep laplacian pyramid networks for fast and accurate super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 624-632, 2017.

DOI

[34]

A. Kappeler,; S. Yoo,; Q. Q. Dai,; A. K. Katsaggelos, Video super-resolution with convolutional neural networks. IEEE Transactions on Computational Imaging Vol. 2, No. 2, 109-122, 2016.

DOI Google Scholar

[35]

J. Johnson,; A. Karpathy,; F. F. Li, DenseCap: Fully convolutional localization networks for dense captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4565-4574, 2016.

DOI

[36]

J. Mao,; W. Xu,; Y. Yang,; J. Wang,; Z. Huang,; A. Yuille, Deep captioning with multimodal recurrent neural networks (m-rnn). arXiv preprint arXiv:1412.6632, 2014.

DOI

[37]

H. N. Yu,; J. Wang,; Z. H. Huang,; Y. Yang,; W. Xu, Video paragraph captioning using hierarchical recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4584-4593, 2016.

DOI

[38]

J. Donahue,; L. A. Hendricks,; S. Guadarrama,; M. Rohrbach,; S. Venugopalan,; T. Darrell,; K. Saenko, Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2625-2634, 2015.

DOI

[39]

S. Venugopalan,; H. Xu,; J. Donahue,; M. Rohrbach,; R. Mooney,; K. Saenko, Translating videos to natural language using deep recurrent neural networks In: Proceedings of the Annual Conference of the North American Chapter of the ACL, 1494-1504, 2015.

[40]

X. Shi,; Z. Chen,; H. Wang,; D. Yeung,; W. Wong,; W. Woo, Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In: Proceedings of the Advances in Neural Information Processing Systems 28, 1-9, 2015.

DOI

[41]

M. Drulea,; S. Nedevschi, Total variation regularization of local-global optical flow. In: Proceedings of the 14th International IEEE Conference on Intelligent Transportation Systems, 318-323, 2011.

DOI

[42]

K. M. He,; X. Y. Zhang,; S. Q. Ren,; J. Sun, Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE International Conference on Computer Vision, 1026-1034, 2015.

DOI

[43]

A. Hore,; D. Ziou, Image quality metrics: PSNR vs. SSIM. In: Proceedings of the 20th International Conference on Pattern Recognition, 2366-2369, 2010.

[44]

M.-H. Cheng,; N.-W. Lin,; K.-S. Hwang,; J.-H. Jeng, Fast video super-resolution using artificial neural networks. In: Proceedings of the 8th International Symposium on Communication Systems, Networks & Digital Signal Processing, 1-4, 2012.

DOI

[45]

Z. Wang,; A. C. Bovik, A universal image quality index. IEEE Signal Processing Letters Vol. 9, No. 3, 81-84, 2002.

DOI Google Scholar

[46]

L. Gatys,; A. S. Ecker,; M. Bethge, Texture synthesis using convolutional neural networks. In: Proceedings of the Advances in Neural Information Processing Systems 28, 262-270, 2015.

[47]

K. Simonyan,; A. Zisserman, Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[48]

I. Goodfellow,; J. Pouget-Abadie,; M. Mirza,; B. Xu,; D. Warde-Farley,; S. Ozair,; A. Courville,; Y. Bengio, Generative adversarial nets. In: Proceedings of the Advances in Neural Information Processing Systems 27, 2672-2680, 2014.

DOI

[49]

H. A. Aly,; E. Dubois, Image up-sampling using total-variation regularization with a new observation model. IEEE Transactions on Image Processing Vol. 14, No. 10, 1647-1659, 2005.

Google Scholar

[50]

J. Hany,; G. Walters, Hands-On Generative Adversarial Networks with PyTorch 1. x: Implement next-generation neural networks to build powerful GAN models using Python. Packt Publishing Ltd., 2019.

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 21 February 2020

Accepted: 23 April 2020

Published: 20 July 2020

Issue date: September 2020

Copyright

Acknowledgements

The author would like to thank Andrew Ng’s lab at Stanford University for their guidance on this project. In particular, the authors express their gratitude to Mohamed El-Geish for the idea-inducing brainstorming sessions throughout the project.

Rights and permissions

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.