Coherent video generation for multiple hand-held cameras with dynamic foreground

Fang-Lue Zhang; Connelly Barnes; Hao-Tian Zhang; Junhong Zhao; Gabriel Salas

doi:10.1007/s41095-020-0187-3

Computational Visual Media 2020, 6(3): 291-306 https://doi.org/10.1007/s41095-020-0187-3

Research Article |

Open Access | Issue | Published: 03 September 2020

Coherent video generation for multiple hand-held cameras with dynamic foreground

Show Author's Information Hide Author's Information Fang-Lue Zhang^¹(

), Connelly Barnes^², Hao-Tian Zhang^³, Junhong Zhao^¹, Gabriel Salas^¹

1 Victoria University of Wellington, Wellington 6012, New Zealand

2 Adobe Research, Seattle, USA

3 Stanford University, San Francisco, USA

Keywords:

video editing, smooth temporal transitions, dynamic foreground, multiple cameras, hand-held cameras

Cite this article:

Zhang F-L, Barnes C, Zhang H-T, et al. Coherent video generation for multiple hand-held cameras with dynamic foreground. Computational Visual Media, 2020, 6(3): 291-306. https://doi.org/10.1007/s41095-020-0187-3

Download citation

EndNote(RIS)

BibTeX

832

Views

Downloads

Citations

Crossref

N/A

WoS

Scopus

CSCD

Abstract Full text Electronic supplementary material About this article

Abstract

For many social events such as public performances, multiple hand-held cameras may capture the same event. This footage is often collected by amateur cinematographers who typically have little control over the scene and may not pay close attention to the camera. For these reasons, each individually captured video may fail to cover the whole time of the event, or may lose track of interesting foreground content such as a performer. We introduce a new algorithm that can synthesize a single smooth video sequence of moving foreground objects captured by multiple hand-held cameras. This allows later viewers to gain a cohesive narrative experience that can transition between different cameras, even though the input footage may be less than ideal. We first introduce a graph-based method for selecting a good transition route. This allows us to automatically select good cut points for the hand-held videos, so that smooth transitions can be created between the resulting video shots. We also propose a method to synthesize a smooth photorealistic transition video between each pair of hand-held cameras, which preserves dynamic foreground content during this transition. Our experiments demonstrate that our method outperforms previous state-of-the-art methods, which struggle to preserve dynamic foreground content.

Full text

Abstract

Full text

Outline

Electronic supplementary material

About this article

Coherent video generation for multiple hand-held cameras with dynamic foreground

Show Author's information Hide Author's Information Fang-Lue Zhang^¹(

), Connelly Barnes^², Hao-Tian Zhang^³, Junhong Zhao^¹, Gabriel Salas^¹

1 Victoria University of Wellington, Wellington 6012, New Zealand

2 Adobe Research, Seattle, USA

3 Stanford University, San Francisco, USA

Abstract

Keywords: video editing, smooth temporal transitions, dynamic foreground, multiple cameras, hand-held cameras

References(44)

[1]

H. Guo,; S. C. Liu,; T. He,; S. Y. Zhu,; B. Zeng,; M. Gabbouj, Joint video stitching and stabilization from moving cameras. IEEE Transactions on Image Processing Vol. 25, No. 11, 5491-5503, 2016.

DOI Google Scholar

[2]

K. M. Lin,; S. C. Liu,; L. F. Cheong,; B. Zeng, Seamless video stitching from hand-held camera inputs. Computer Graphics Forum Vol. 35, No. 2, 479-487, 2016.

DOI Google Scholar

[3]

Y. W. Nie,; T. Su,; Z. S. Zhang,; H. Q. Sun,; G. Q. Li, Dynamic video stitching via shakiness removing. IEEE Transactions on Image Processing Vol. 27, No. 1, 164-178, 2018.

DOI Google Scholar

[4]

I. Arev,; H. S. Park,; Y. Sheikh,; J. Hodgins,; A. Shamir, Automatic editing of footage from multiple social cameras. ACM Transactions on Graphics Vol. 33, No. 4, Article No. 81, 2014.

DOI Google Scholar

[5]

J. Carranza,; C. Theobalt,; M. A. Magnor,; H.-P. Seidel, Free-viewpoint video of human actors. ACM Transactions on Graphics Vol. 22, No. 3, 569-577, 2003.

DOI Google Scholar

[6]

A. Collet,; M. Chuang,; P. Sweeney,; D. Gillett,; D. Evseev,; D. Calabrese,; H. Hoppe,; A. Kirk,; S. Sullivan, High-quality streamable free-viewpoint video. ACM Transactions on Graphics Vol. 34, No. 4, Article No. 69, 2015.

DOI Google Scholar

[7]

R. Szeliski,; H.-Y. Shum, Creating full view panoramic image mosaics and environment maps. In: Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques, 251-258, 1997.

DOI

[8]

M. A. El-Saban,; M. Refaat,; A. Kaheel,; A. Abdul-Hamid, Stitching videos streamed by mobile phones in real-time. In: Proceedings of the 17th ACM International Conference on Multimedia, 1009-1010, 2009.

DOI

[9]

W.-Y. Lin,; S. Liu,; Y. Matsushita,; T.-T. Ng,; F. L. Cheong, Smoothly varying affine stitching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 345-352, 2011.

DOI

[10]

J. Zaragoza,; T. J. Chin,; Q. H. Tran,; M. S. Brown,; D. Suter, As-projective-as-possible image stitching with moving DLT. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 36, No. 7, 1285-1298, 2014.

DOI Google Scholar

[11]

T. Z. Ma,; Y. W. Nie,; Q. Zhang,; Z. S. Zhang,; H. Q. Sun,; G. Q. Li, Effective video stabilization via joint trajectory smoothing and frame warping. IEEE Transactions on Visualization and Computer Graphics doi: 10.1109/TVCG.2019.2923196, 2019.

DOI Google Scholar

[12]

F. Liu,; M. Gleicher,; H. L. Jin,; A. Agarwala, Content-preserving warps for 3D video stabilization. In: Proceedings of the ACM SIGGRAPH 2009 papers, Article No. 44, 2009.

DOI

[13]

F.-L. Zhang,; X Wu,; H.-T. Zhang,; J. Wang,; S.-M. Hu, Robust background identification for dynamic video editing. ACM Transactions on Graphics Vol. 35, No. 6, Article No. 197, 2016.

DOI Google Scholar

[14]

V. Kwatra,; A. Schedl,; I. Essa,; G. Turk,; A. Bobick, Graphcut textures: Image and video synthesis using graph cuts. ACM Transactions on Graphics Vol. 22, No. 3, 277-286, 2003.

DOI Google Scholar

[15]

A. Agarwala,; K. C. Zheng,; C. Pal,; M. Agrawala,; M. Cohen,; B. Curless,; D. Salesin,; R. Szeliski, Panoramic video textures. ACM Transactions on Graphics Vol. 24, No. 3, 821-827, 2005.

DOI Google Scholar

[16]

R. Anderson,; D. Gallup,; J. T. Barron,; J. Kontkanen,; N. Snavely,; C. Hernández,; S. Agarwal,; S. M. Seitz, Jump: virtual reality video. ACM Transactions on Graphics Vol. 35, No. 6, Article No. 198, 2016.

DOI Google Scholar

[17]

R. M. A. Silva,; B. Feijó,; P. B. Gomes,; T. Frensh,; D. Monteiro, Real time 360∘ video stitching and streaming. In: Proceedings of the ACM SIGGRAPH 2016 Posters, Article No. 70, 2016.

DOI

[18]

H. Guo,; S. C. Liu,; S. Y. Zhu,; H. T. Shen,; B. Zeng, View-consistent MeshFlow for stereoscopic video stabilization. IEEE Transactions on Computational Imaging Vol. 4, No. 4, 573-584, 2018.

DOI Google Scholar

[19]

X. Wei,; J. Chai, Videomocap: Modeling physically realistic human motion from monocular video sequences. ACM Transactions on Graphics Vol. 29, No. 4, Article No. 42, 2010.

DOI Google Scholar

[20]

L. Ballan,; G. J. Brostow,; J. Puwein,; M. Pollefeys, Unstructured video-based rendering: Interactive exploration of casually captured videos. ACM Transactions on Graphics Vol. 29, No. 4, Article No. 87, 2010.

DOI Google Scholar

[21]

J. Tompkin,; K. I. Kim,; J. Kautz,; C Theobalt, Videoscapes: exploring sparse, unstructured video collections. ACM Transactions on Graphics Vol. 31, No. 4, Article No. 68, 2012.

DOI Google Scholar

[22]

M. Wang,; X. Q. Lyu,; Y. J. Li,; F. L. Zhang, VR content creation and exploration with deep learning: A survey. Computational Visual Media Vol. 6, No. 1, 3-28, 2020.

DOI Google Scholar

[23]

Z. Zhu,; J. M. Lu,; M. X. Wang,; S. H. Zhang,; R. R. Martin,; H. T. Liu,; S.-M. Hu, A comparative study of algorithms for realtime panoramic video blending. IEEE Transactions on Image Processing Vol. 27, No. 6, 2952-2965, 2018.

DOI Google Scholar

[24]

W. Lee,; H. Chen,; M. Chen,; I. Shen,; B. Y. Chen, High-resolution 360 video foveated stitching for real-time VR. Computer Graphics Forum Vol. 36, No. 7, 115-123, 2017.

DOI Google Scholar

[25]

Q. X. Liu,; X. Y. Su,; L. Zhang,; H. Huang, Panoramic video stitching of dual cameras based on spatio-temporal seam optimization. Multimedia Tools and Applications Vol. 79, 3107-3124, 2020.

DOI Google Scholar

[26]

F. Perazzi,; A. Sorkine-Hornung,; H. Zimmer,; P. Kaufmann,; O. Wang,; S. Watson,; M. Gross. Panoramic video from unstructured camera arrays. Computer Graphics Forum Vol. 34, No. 2, 57-68, 2015.

DOI Google Scholar

[27]

O. Wang,; C. Schroers,; H. Zimmer,; M. Gross,; A. Sorkine-Hornung, VideoSnapping: Interactive synchronization of multiple videos. ACM Transactions on Graphics Vol. 33, No. 4, Article No. 77, 2014.

DOI Google Scholar

[28]

Z. P. Cui,; O. Wang,; P. Tan,; J. Wang, Time slice video synthesis by robust video alignment. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 131, 2017.

DOI Google Scholar

[29]

C. Barnes,; D. B. Goldman,; E. Shechtman,; A. Finkelstein, Video tapestries with continuous temporal zoom. ACM Transactions on Graphics Vol. 29, No. 4, Article No. 89, 2010.

DOI Google Scholar

[30]

Z. S. Zhang,; Y. W. Nie,; H. Q. Sun,; Q. X. Lai,; G. Q. Li, Multi-video object synopsis integrating optimal view switching. In: Proceedings of the SIGGRAPH Asia 2017 Technical Briefs, Article No. 17, 2017.

DOI

[31]

M. Wang,; A. Shamir,; G. Y. Yang,; J. K. Lin,; G. W. Yang,; S. P. Lu,; S.-M. Hu, BiggerSelfie: Selfie video expansion with hand-held camera. IEEE Transactions on Image Processing Vol. 27, No. 12, 5854-5865, 2018.

DOI Google Scholar

[32]

C. Wu, VisualSFM: A visual structure from motion system. 2011. Available at http://ccwu.me/vsfm.

[33]

C. Wu,; S. Agarwal,; B. Curless,; S. M. Seitz, Multicore bundle adjustment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3057-3064, 2011.

DOI

[34]

S. M. Lee,; J. H. Xin,; S. Westland, Evaluation of image similarity by histogram intersection. Color Research & Application Vol. 30, No. 4, 265-274, 2005.

DOI Google Scholar

[35]

A. Newson,; A. Almansa,; M. Fradet,; Y. Gousseau,; P. Pérez, Video inpainting of complex scenes. SIAM Journal on Imaging Sciences Vol. 7, No. 4, 1993-2019, 2014.

DOI Google Scholar

[36]

D. G. Lowe, Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision Vol. 60, No. 2, 91-110, 2004.

DOI Google Scholar

[37]

K. M. He,; J. Sun,; X. O. Tang, Guided image filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 35, No. 6, 1397-1409, 2013.

DOI Google Scholar

[38]

X. Wu,; X. N. Fang,; T. Chen,; F. L. Zhang, JMNet: A joint matting network for automatic human matting. Computational Visual Media Vol. 6, No. 2, 215-224, 2020.

DOI Google Scholar

[39]

Y. Boykov,; O. Veksler,; R. Zabih, Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 23, No. 11, 1222-1239, 2001.

DOI Google Scholar

[40]

Y. Zhang,; Y.-K. Lai,; F.-L. Zhang, Content-preserving image stitching with regular boundary constraints. arXiv preprint arXiv:1810.11220, 2018.

[41]

S. Belongie,; J. Malik,; J. Puzicha, Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 24, No. 4, 509-522, 2002.

DOI Google Scholar

[42]

M. M. Cheng,; F. L. Zhang,; N. J. Mitra,; X. L. Huang,; S. M. Hu, RepFinder: Finding approximately repeated scene elements for image editing. In: Proceedings of the ACM SIGGRAPH 2010 Papers, Article No. 83, 2010.

DOI

[43]

J. L. Schönberger,; J.-M. Frahm, Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4104-4113, 2016.

DOI

[44]

D. G. Lowe, Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision Vol. 60, No. 2, 91-110, 2004.

DOI Google Scholar

Electronic supplementary material

Video

41095_2020_187_MOESM1_ESM.mp4

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 30 June 2020

Accepted: 16 July 2020

Published: 03 September 2020

Issue date: September 2020

Copyright

Acknowledgements

This work was supported by a Research Establishment Grant of Victoria University of Wellington (Project No. 8-1620-216786-3744) and a Victoria Research Excellence Award.

Rights and permissions

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.