Journal Home > Volume 6 , Issue 3

Recently, learning-based models have enhanced the performance of single-image super-resolution (SISR). However, applying SISR successively to each video frame leads to a lack of temporal coherency. Convolutional neural networks (CNNs) outperform traditional approaches in terms of image quality metrics such as peak signal to noise ratio (PSNR) and structuralsimilarity (SSIM). On the other hand, generative adversarial networks (GANs) offer a competitive advantage by being able to mitigate the issue of a lack of finer texture details, usually seen with CNNs when super-resolving at large upscaling factors. We present iSeeBetter, a novel GAN-based spatio-temporal approach to video super-resolution (VSR) that renders temporally consistent super-resolution videos. iSeeBetter extracts spatial and temporal information from the current and neighboring frames using the concept of recurrent back-projection networks as its generator. Furthermore, to improve the "naturality" of the super-resolved output while eliminating artifacts seen with traditional algorithms, we utilize the discriminator from super-resolution generative adversarial network. Although mean squared error (MSE) as a primary loss-minimization objective improves PSNR/SSIM, these metrics may not capture fine details in the image resulting in misrepresentation of perceptual quality. To address this, we use a four-fold (MSE, perceptual, adversarial, and total-variation loss function. Our results demonstrate that iSeeBetter offers superior VSR fidelity and surpasses state-of-the-art performance.


menu
Abstract
Full text
Outline
About this article

iSeeBetter: Spatio-temporal video super-resolution using recurrent generative back-projection networks

Show Author's information Aman Chadha1( )John Britto2M. Mani Roja3
Department of Computer Science, Stanford University, 450 Serra Mall, Stanford, CA 94305, USA
Department of Computer Science, University of Massachusetts Amherst, Amherst, MA 01003, USA
Department of Electronics and Telecommunication Engineering, University of Mumbai, Mumbai, Maharashtra 400032, India

Abstract

Recently, learning-based models have enhanced the performance of single-image super-resolution (SISR). However, applying SISR successively to each video frame leads to a lack of temporal coherency. Convolutional neural networks (CNNs) outperform traditional approaches in terms of image quality metrics such as peak signal to noise ratio (PSNR) and structuralsimilarity (SSIM). On the other hand, generative adversarial networks (GANs) offer a competitive advantage by being able to mitigate the issue of a lack of finer texture details, usually seen with CNNs when super-resolving at large upscaling factors. We present iSeeBetter, a novel GAN-based spatio-temporal approach to video super-resolution (VSR) that renders temporally consistent super-resolution videos. iSeeBetter extracts spatial and temporal information from the current and neighboring frames using the concept of recurrent back-projection networks as its generator. Furthermore, to improve the "naturality" of the super-resolved output while eliminating artifacts seen with traditional algorithms, we utilize the discriminator from super-resolution generative adversarial network. Although mean squared error (MSE) as a primary loss-minimization objective improves PSNR/SSIM, these metrics may not capture fine details in the image resulting in misrepresentation of perceptual quality. To address this, we use a four-fold (MSE, perceptual, adversarial, and total-variation loss function. Our results demonstrate that iSeeBetter offers superior VSR fidelity and surpasses state-of-the-art performance.

Keywords: super resolution, video upscaling, frame recurrence, optical flow, generative adver-sarial networks, convolutional neural networks

References(50)

[1]
C. Dong,; C. C. Loy,; K. M. He,; X. O. Tang, Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 38, No. 2, 295-307, 2016.
[2]
M. Haris,; G. Shakhnarovich,; N. Ukita, Deep back-projection networks for super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1664-1673, 2018.
DOI
[3]
M. Haris,; M. R. Widyanto,; H. Nobuhara, Inception learning super-resolution. Applied Optics Vol. 56, No. 22, 6043, 2017.
[4]
J. Kim,; J. K. Lee,; K. M. Lee, Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1646-1654, 2016.
DOI
[5]
E. Faramarzi,; D. Rajan,; M. P. Christensen, Unified blind method for multi-image super-resolution and single/multi-image blur deconvolution. IEEE Transactions on Image Processing Vol. 22, No. 6, 2101-2114, 2013.
[6]
D. C. Garcia,; C. Dorea,; R. L. de Queiroz, Super resolution for multiview images using depth information. IEEE Transactions on Circuits and Systems for Video Technology Vol. 22, No. 9, 1249-1256, 2012.
[7]
J. Caballero,; C. Ledig,; A. Aitken,; A. Acosta,; J. Totz,; Z. H. Wang,; W. Shi, Real-time video super-resolution with spatio-temporal networks and motion compensation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4778-4787, 2017.
DOI
[8]
X. Tao,; H. Y. Gao,; R. J. Liao,; J. Wang,; J. Y. Jia, Detail-revealing deep video super-resolution. In: Proceedings of the IEEE International Conference on Computer Vision, 4472-4480, 2017.
DOI
[9]
M. S. M. Sajjadi,; R. Vemulapalli,; M. Brown, Frame-recurrent video super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6626-6634, 2018.
DOI
[10]
M. Haris,; G. Shakhnarovich,; N. Ukita, Recurrent back-projection network for video super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3897-3906, 2019.
DOI
[11]
Y. Jo,; S. W. Oh,; J. Kang,; S. J. Kim, Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3224-3232, 2018.
DOI
[12]
W. Z. Shi,; J. Caballero,; F. Huszar,; J. Totz,; A. P. Aitken,; R. Bishop,; D. Rueckert,; Z. Wang, Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1874-1883, 2016.
DOI
[13]
Y. Huang,; W. Wang,; L. Wang, Bidirectional recurrent convolutional networks for multi-frame super-resolution. In: Proceedings of the Advances in Neural Information Processing Systems 28, 235-243, 2015.
[14]
D. Liu,; Z. W. Wang,; Y. C. Fan,; X. M. Liu,; Z. Y. Wang,; S. Y. Chang,; T. Huang, Robust video super-resolution with learned temporal dynamics. In: Proceedings of the IEEE International Conference on Computer Vision, 2507-2515, 2017.
DOI
[15]
R. J. Liao,; X. Tao,; R. Y. Li,; Z. Y. Ma,; J. Y. Jia, Video super-resolution via deep draft-ensemble learning. In: Proceedings of the IEEE International Conference on Computer Vision, 531-539, 2015.
DOI
[16]
F. A. Gers,; J. Schmidhuber,; F. Cummins, Learning to forget: Continual prediction with LSTM. Neural Computation Vol. 12, No. 10, 2451-2471, 2000.
[17]
O. Makansi,; E. Ilg,; T. Brox, End-to-end learning of video super-resolution with motion compensation. In: Pattern Recognition. Lecture Notes in Computer Science, Vol. 10496. V. Roth,; T. Vetter, Eds. Springer Cham, 203-214, 2017.
DOI
[18]
M. Irani,; S. Peleg, Improving resolution by image registration. CVGIP: Graphical Models and Image Processing Vol. 53, No. 3, 231-239, 1991.
[19]
M. Irani,; S. Peleg, Motion analysis for image enhancement: Resolution, occlusion, and transparency. Journal of Visual Communication and Image Representation Vol. 4, No. 4, 324-335, 1993.
[20]
C. Ledig,; L. Theis,; F. Huszar,; J. Caballero,; A. Cunningham,; A. Acosta,; A. Aitken,; A. Tejani,; J. Totz,; Z. et al. Wang, Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4681-4690, 2017.
DOI
[21]
Ren, H.; Fang, X. Recurrent back-projection network for video super-resolution. In: Final Project for MIT 6.819 Advances in Computer Vision, 1-6, 2018.
DOI
[22]
Z. H. Wang,; J. Chen,; S. C. H. Hoi, Deep learning for image super-resolution: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence DOI: 10.1109/TPAMI.2020.2982166, 2020.
[23]
M. Mathieu,; C. Couprie,; Y. LeCun, Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440, 2015.
[24]
J. Johnson,; A. Alahi,; F. F. Li, Perceptual losses for real-time style transfer and super-resolution. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9906. B. Leibe,; J. Matas,; N. Sebe,; M. Welling, Eds. Springer Cham, 694-711, 2016.
DOI
[25]
A. Dosovitskiy,; T. Brox, Generating images with perceptual similarity metrics based on deep networks. In: Proceedings of the Advances in Neural Information Processing Systems 29, 658-666, 2016.
DOI
[26]
J. Bruna,; P. Sprechmann,; Y. LeCun, Super-resolution with deep convolutional sufficient statistics. In: Proceedings of the 4th International Conference on Learning Representations, 2016.
DOI
[27]
T. F. Xue,; B. A. Chen,; J. J. Wu,; D. L. Wei,; W. T. Freeman, Video enhancement with task-oriented flow. International Journal of Computer Vision Vol. 127, No. 8, 1106-1125, 2019.
[28]
C. Liu,; D. Q. Sun, A Bayesian approach to adaptive video super resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 209-216, 2011.
DOI
[29]
R. Tsai, Multiframe image restoration and registration. Advance Computer Visual and Image Processing Vol. 1, 317-339, 1984.
[30]
J. C. Yang,; T. Huang, Image super-resolution: Historical overview and future challenges. In: Super-Resolution Imaging. P. Milanfar, Ed. CRC Press, 1-34, 2017.
DOI
[31]
Y. Tai,; J. Yang,; X. M. Liu, Image super-resolution via deep recursive residual network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3147-3155, 2017.
DOI
[32]
J. Kim,; J. K. Lee,; K. M. Lee, Deeply-recursive convolutional network for image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1637-1645, 2016.
DOI
[33]
W. S. Lai,; J. B. Huang,; N. Ahuja,; M. H. Yang, Deep laplacian pyramid networks for fast and accurate super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 624-632, 2017.
DOI
[34]
A. Kappeler,; S. Yoo,; Q. Q. Dai,; A. K. Katsaggelos, Video super-resolution with convolutional neural networks. IEEE Transactions on Computational Imaging Vol. 2, No. 2, 109-122, 2016.
[35]
J. Johnson,; A. Karpathy,; F. F. Li, DenseCap: Fully convolutional localization networks for dense captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4565-4574, 2016.
DOI
[36]
J. Mao,; W. Xu,; Y. Yang,; J. Wang,; Z. Huang,; A. Yuille, Deep captioning with multimodal recurrent neural networks (m-rnn). arXiv preprint arXiv:1412.6632, 2014.
DOI
[37]
H. N. Yu,; J. Wang,; Z. H. Huang,; Y. Yang,; W. Xu, Video paragraph captioning using hierarchical recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4584-4593, 2016.
DOI
[38]
J. Donahue,; L. A. Hendricks,; S. Guadarrama,; M. Rohrbach,; S. Venugopalan,; T. Darrell,; K. Saenko, Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2625-2634, 2015.
DOI
[39]
S. Venugopalan,; H. Xu,; J. Donahue,; M. Rohrbach,; R. Mooney,; K. Saenko, Translating videos to natural language using deep recurrent neural networks In: Proceedings of the Annual Conference of the North American Chapter of the ACL, 1494-1504, 2015.
[40]
X. Shi,; Z. Chen,; H. Wang,; D. Yeung,; W. Wong,; W. Woo, Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In: Proceedings of the Advances in Neural Information Processing Systems 28, 1-9, 2015.
DOI
[41]
M. Drulea,; S. Nedevschi, Total variation regularization of local-global optical flow. In: Proceedings of the 14th International IEEE Conference on Intelligent Transportation Systems, 318-323, 2011.
DOI
[42]
K. M. He,; X. Y. Zhang,; S. Q. Ren,; J. Sun, Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE International Conference on Computer Vision, 1026-1034, 2015.
DOI
[43]
A. Hore,; D. Ziou, Image quality metrics: PSNR vs. SSIM. In: Proceedings of the 20th International Conference on Pattern Recognition, 2366-2369, 2010.
[44]
M.-H. Cheng,; N.-W. Lin,; K.-S. Hwang,; J.-H. Jeng, Fast video super-resolution using artificial neural networks. In: Proceedings of the 8th International Symposium on Communication Systems, Networks & Digital Signal Processing, 1-4, 2012.
DOI
[45]
Z. Wang,; A. C. Bovik, A universal image quality index. IEEE Signal Processing Letters Vol. 9, No. 3, 81-84, 2002.
[46]
L. Gatys,; A. S. Ecker,; M. Bethge, Texture synthesis using convolutional neural networks. In: Proceedings of the Advances in Neural Information Processing Systems 28, 262-270, 2015.
[47]
K. Simonyan,; A. Zisserman, Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[48]
I. Goodfellow,; J. Pouget-Abadie,; M. Mirza,; B. Xu,; D. Warde-Farley,; S. Ozair,; A. Courville,; Y. Bengio, Generative adversarial nets. In: Proceedings of the Advances in Neural Information Processing Systems 27, 2672-2680, 2014.
DOI
[49]
H. A. Aly,; E. Dubois, Image up-sampling using total-variation regularization with a new observation model. IEEE Transactions on Image Processing Vol. 14, No. 10, 1647-1659, 2005.
[50]
J. Hany,; G. Walters, Hands-On Generative Adversarial Networks with PyTorch 1. x: Implement next-generation neural networks to build powerful GAN models using Python. Packt Publishing Ltd., 2019.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 21 February 2020
Accepted: 23 April 2020
Published: 20 July 2020
Issue date: September 2020

Copyright

© The Author(s) 2020

Acknowledgements

The author would like to thank Andrew Ng’s lab at Stanford University for their guidance on this project. In particular, the authors express their gratitude to Mohamed El-Geish for the idea-inducing brainstorming sessions throughout the project.

Rights and permissions

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Return