Journal Home > Volume 6 , Issue 1

Virtual reality (VR) offers an artificial, com-puter generated simulation of a real life environment. It originated in the 1960s and has evolved to provide increasing immersion, interactivity, imagination, and intelligence. Because deep learning systems are able to represent and compose information at various levels in a deep hierarchical fashion, they can build very powerful models which leverage large quantities of visual media data. Intelligence of VR methods and applications has been significantly boosted by the recent developmentsin deep learning techniques. VR content creationand exploration relates to image and video analysis, synthesis and editing, so deep learning methods such as fully convolutional networks and general adversarial networks are widely employed, designed specifically to handle panoramic images and video and virtual 3D scenes. This article surveys recent research that uses such deep learning methods for VR content creation and exploration. It considers the problems involved, and discusses possible future directions in this active and emerging research area.


menu
Abstract
Full text
Outline
About this article

VR content creation and exploration with deep learning: A survey

Show Author's information Miao Wang1,2( )Xu-Quan Lyu1Yi-Jun Li1Fang-Lue Zhang3
State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing 100191, China.
Peng Cheng Laboratory, Shenzhen 518000, China.
School of Engineering and Computer Science, Victoria University of Wellington, New Zealand.

Abstract

Virtual reality (VR) offers an artificial, com-puter generated simulation of a real life environment. It originated in the 1960s and has evolved to provide increasing immersion, interactivity, imagination, and intelligence. Because deep learning systems are able to represent and compose information at various levels in a deep hierarchical fashion, they can build very powerful models which leverage large quantities of visual media data. Intelligence of VR methods and applications has been significantly boosted by the recent developmentsin deep learning techniques. VR content creationand exploration relates to image and video analysis, synthesis and editing, so deep learning methods such as fully convolutional networks and general adversarial networks are widely employed, designed specifically to handle panoramic images and video and virtual 3D scenes. This article surveys recent research that uses such deep learning methods for VR content creation and exploration. It considers the problems involved, and discusses possible future directions in this active and emerging research area.

Keywords: deep learning, virtual reality, neural net-works, 360∘ image and video virtual content

References(175)

[1]
Oculus Rift. Available at https://www.oculus.com/.
[2]
HTC Vive. Available at https://www.vive.com/cn/.
[3]
R. Szeliski, Image alignment and stitching: A tutorial. Foundations and Trends®in Computer Graphics and Vision Vol. 2, No. 1, 1-104, 2006.
[4]
N. Snavely,; S. M. Seitz,; R. Szeliski, Photo tourism: Exploring photo collections in 3D. ACM Transactions on Graphics Vol. 25, No. 3, 835-846, 2006.
[5]
J. Huang,; X. Shi,; X. Liu,; K. Zhou,; L.-Y. Wei,; S.-H. Teng,; H. Bao,; B. Guo,; H.-Y. Shum, Subspace gradient domain mesh deformation. ACM Transactions on Graphics Vol. 25, No. 3, 1126-1134, 2006.
[6]
K. Xu,; K. Chen,; H. Fu,; W.-L. Sun,; S.-M. Hu,Sketch2Scene: Sketch-based co-retrieval and co-placement of 3D models. ACM Transactions on Graphics Vol. 32, No. 4, Article No. 123, 2013.
[7]
J. H. Nah,; Y. Lim,; S. Ki,; C. Shin, Z2 traversal order: An interleaving approach for VR stereo rendering on tile-based GPUs. Computational Visual Media Vol. 3, No. 4, 349-357, 2017.
[8]
J. Redmon,; S. Divvala,; R. Girshick,; A. Farhadi, You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 779-788, 2016.
DOI
[9]
K. He,; G. Gkioxari,; P. Dollár,; R. Girshick, Mask R-CNN In: Proceedings of the IEEE International Conference on Computer Vision, 2961-2969, 2017.
DOI
[10]
J. Long,; E. Shelhamer,; T. Darrell, Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431-3440, 2015.
DOI
[11]
B. Zhou,; H. Zhao,; X. Puig,; S. Fidler,; A. Barriuso,; A. Torralba, Scene parsing through ADE20k dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 633-641, 2017.
DOI
[12]
H. Zhao,; J. Shi,; X. Qi,; X. Wang,; J. Jia, Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2881-2890, 2017.
DOI
[13]
D. Xu,; Y. Zhu,; C. B. Choy,; L. Fei-Fei, Scene graph generation by iterative message passing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5410-5419, 2017.
DOI
[14]
B. Dai,; Y. Zhang,; D. Lin, Detecting visual relationships with deep relational networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3076-3086, 2017.
DOI
[15]
L. A. Gatys,; A. S. Ecker,; M. Bethge, Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2414-2423, 2016.
DOI
[16]
J. Johnson,; A. Alahi,; F. F. Li, Perceptual losses for real-time style transfer and super-resolution. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9906. B. Leibe,; J. Matas,; N. Sebe,; M. Welling, Eds. Springer Cham, 694-711, 2016.
[17]
F. Luan,; S. Paris,; E. Shechtman,; K. Bala, Deep photo style transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4990-4998, 2017.
DOI
[18]
P. Isola,; J. Zhu,; T. Zhou,; A. A. Efros, Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1125-1134, 2017.
DOI
[19]
J. Y. Zhu,; T. Park,; P. Isola,; A. A. Efros, Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2242-2251, 2017.
DOI
[20]
Y. Choi,; M. Choi,; M. Kim,; J. W. Ha,; S. Kim,; J. Choo, StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8789-8797, 2018.
DOI
[21]
M. Wang,; G.-Y. Yang,; R. Li,; R.-Z. Liang,; S.-H. Zhang,; P. M. Hall,; S.-M. Hu, Example-guided style-consistent image synthesis from semantic labeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1495-1504, 2019.
DOI
[22]
W.-S. Lai,; O. Gallo,; J. Gu,; D. Sun,; M.-H. Yang,; J. Kantz, Video stitching for linear camera arrays. In: Proceedings of the British Machine Vision Conference, 2019.
[23]
T. Rhee,; L. Petikam,; B. Allen,; A. Chalmers, MR360: Mixed reality rendering for 360 panoramic videos. IEEE Transactions on Visualization and Computer Graphics Vol. 23, No. 4, 1379-1388, 2017.
[24]
R. Anderson,; D. Gallup,; J. T. Barron,; J. Kontkanen,; N. Snavely,; C. Hernández,; S. Agarwal,; S. M. Seitz, Jump: Virtual reality video. ACM Transactions on Graphics Vol. 35, No. 6, Article No. 198, 2016.
[25]
R. S. Overbeck,; D. Erickson,; D. Evangelakos,; M. Pharr,; P. Debevec, A system for acquiring, processing, and rendering panoramic light field stills for virtual reality. ACM Transactions on Graphics Vol. 37, No. 6, Article No. 197, 2019.
[26]
C. Schroers,; J. C. Bazin,; A. Sorkine-Hornung, An omnistereoscopic video pipeline for capture and display of real-world VR. ACM Transactions on Graphics Vol. 37, No. 3, Article No. 37, 2018.
[27]
K. Matzen,; M. F. Cohen,; B. Evans,; J. Kopf,; R. Szeliski, Low-cost 360 stereo photography and video capture. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 148, 2017.
[28]
T. Bertel,; N. D. F. Campbell,; C. Richardt, MegaParallax: Casual 360 panoramas with motion parallax. IEEE Transactions on Visualization and Computer Graphics Vol. 25, No. 5, 1828-1835, 2019.
[29]
P. Hedman,; S. Alsisan,; R. Szeliski,; J. Kopf, Casual 3D photography. ACM Transactions on Graphics Vol. 36, No. 6, Article No. 234, 2017.
[30]
P. Hedman,; J. Kopf, Instant 3D photography. ACM Transactions on Graphics Vol. 37, No. 4, Article No. 101, 2018.
[31]
L. Wei,; Z. Zhong,; C. Lang,; Z. Yi, A survey on image and video stitching. Virtual Reality & Intelligent Hardware Vol. 1, No. 1, 55-83, 2019.
[32]
M. Brown,; D. G. Lowe, Automatic panoramic image stitching using invariant features. International Journal of Computer Vision Vol. 74, No. 1, 59-73, 2007.
[33]
Y. Zhang,; Y. K. Lai,; F. L. Zhang, Content-preserving image stitching with piecewise rectangular boundary constraints. IEEE Transactions on Visualization and Computer Graphics , 2020.
[34]
Y. Zhang,; Y. K. Lai,; F. L. Zhang, Stereoscopic image stitching with rectangular boundaries. The Visual Computer Vol. 35, Nos. 6-8, 823-835, 2019.
[35]
Z. Zhu,; J. M. Lu,; M. X. Wang,; S. H. Zhang,; R. R. Martin,; H. T. Liu,; et al. A comparative study of algorithms for realtime panoramic video blending. IEEE Transactions on Image Processing Vol. 27, No. 6, 2952-2965, 2018.
[36]
H. Altwaijry,; A. Veit,; S. Belongie, Learning to detect and match keypoints with deep architectures. In: Proceedings of the British Machine Vision Conference, 2016.
DOI
[37]
V. Balntas,; K. Lenc,; A. Vedaldi,; K. Mikolajczyk, HPatches: A benchmark and evaluation of handcrafted and learned local descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3852-3861, 2017.
DOI
[38]
D. DeTone,; T. Malisiewicz,; A. Rabinovich, Deep image homography estimation. arXiv preprint arXiv:1606.03798, 2016.
[39]
T. Nguyen,; S. W. Chen,; S. S. Shivakumar,; C. J. Taylor,; V. Kumar, Unsupervised deep homography: A fast and robust homography estimation model. IEEE Robotics and Automation Letters Vol. 3, No. 3, 2346-2353, 2018.
[40]
J. Zhang,; C. Wang,; S. Liu,; L. Jia,; J. Wang,; J. Zhou, Content-aware unsupervised deep homography estimation. arXiv preprint arXiv:1909.05983, 2019.
DOI
[41]
N. Ye,; C. Wang,; S. Liu,; L. Jia,; J. Wang,; Y. Cui, DeepMeshFlow: Content adaptive mesh deformation for robust image registration. arXiv preprint arXiv:1912.05131, 2019.
[42]
J. Revaud,; P. Weinzaepfel,; Z. Harchaoui,; C. Schmid, DeepMatching: Hierarchical deformable dense matching. International Journal of Computer Vision Vol. 120, No. 3, 300-323, 2016.
[43]
P. Weinzaepfel,; J. Revaud,; Z. Harchaoui,; C. Schmid, DeepFlow: Large displacement optical flow with deep matching. In: Proceedings of the IEEE International Conference on Computer Vision, 1385-1392, 2013.
DOI
[44]
E. Ilg,; N. Mayer,; T. Saikia,; M. Keuper,; A. Dosovitskiy,; T. Brox, FlowNet 2.0: Evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1647-1655, 2017.
DOI
[45]
Z. G. Tu,; W. Xie,; D. J. Zhang,; R. Poppe,; R. C. Veltkamp,; B. X. Li,; J. Yuan, A survey of variational and CNN-based optical flow techniques. Signal Processing: Image Communication Vol. 72, 9-24, 2019.
[46]
K. M. Lin,; S. C. Liu,; L. F. Cheong,; B. Zeng, Seamless video stitching from hand-held camera inputs. Computer Graphics Forum Vol. 35, No. 2, 479-487, 2016.
[47]
M. Wang,; A. Shamir,; G. Y. Yang,; J. K. Lin,; G. W. Yang,; S. P. Lu,; S.-M. Hu, BiggerSelfie: Selfie video expansion with hand-held camera. IEEE Transactions on Image Processing Vol. 27, No. 12, 5854-5865, 2018.
[48]
R. Jung,; A. S. J. Lee,; A. Ashtari,; J. C. Bazin, Deep360Up: A deep learning-based approach for automatic VR image upright adjustment. In: Proceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces, 1-8, 2019.
DOI
[49]
J. X. Xiao,; K. A. Ehinger,; A. Oliva,; A. Torralba, Recognizing scene viewpoint using panoramic place representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2695-2702, 2012.
[50]
Y. Furukawa,; J. Ponce, Accurate, dense, and robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 32, No. 8, 1362-1376, 2010.
[51]
M. Goesele,; N. Snavely,; B. Curless,; H. Hoppe,; S. M. Seitz, Multi-view stereo for community photo collections. In: Proceedings of the IEEE 11th International Conference on Computer Vision, 1-8, 2007.
DOI
[52]
M. Q. Ji,; J. Gall,; H. T. Zheng,; Y. B. Liu,; L. Fang, SurfaceNet: An end-to-end 3D neural network for multiview stereopsis. In: Proceedings of the IEEE International Conference on Computer Vision, 2326-2334, 2017.
DOI
[53]
B. Ummenhofer; T. Brox, Global, dense multiscale reconstruction for a billion points. In: Proceedings of the IEEE International Conference on Computer Vision, 1341-1349, 2015.
DOI
[54]
M. Jancosek,; T. Pajdla, Multi-view reconstruction preserving weakly-supported surfaces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3121-3128, 2011.
DOI
[55]
W. J. Xi,; X. J. Chen, Reconstructing piecewise planar scenes with multi-view regularization. Computational Visual Media Vol. 5, No. 4, 337-345, 2019.
[56]
A. Knapitsch,; J. Park,; Q.-Y. Zhou,; V. Koltun, Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 78, 2017.
[57]
C. Buehler,; M. Bosse,; L. McMillan,; S. Gortler,; M. Cohen, Unstructured lumigraph rendering. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, 425-432, 2001.
DOI
[58]
J. Flynn,; I. Neulander,; J. Philbin,; N. Snavely, Deep stereo: Learning to predict new views from the world’s imagery. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5515-5524, 2016.
DOI
[59]
T. H. Zhou,; S. Tulsiani,; W. L. Sun,; J. Malik,; A. A. Efros, View synthesis by appearance flow. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9908. B. Leibe,; J. Matas,; N. Sebe,; M. Welling, Eds. Springer Cham, 286-301, 2016.
[60]
J. Flynn,; M. Broxton,; P. Debevec,; M. DuVall,; G. Fyffe,; R. Overbeck,; N. Snavely,; R. Tucker, DeepView: View synthesis with learned gradient descent. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2367-2376, 2019.
DOI
[61]
P. Hedman,; J. Philip,; T. Price,; J. M. Frahm,; G. Drettakis,; G. Brostow, Deep blending for free-viewpoint image-based rendering. ACM Transactions on Graphics Vol. 37, No. 6, Article No. 257, 2018.
[62]
M. C. Trinidad,; R. M. Brualla,; F. Kainz,; J. Kontkanen, Multi-view image fusion. In: Proceedings of the IEEE International Conference on Computer Vision, 4101-4110, 2019.
DOI
[63]
Introducing vr180 cameras. Available at https://vr.google.com/vr180/.
[64]
A. Tewari,; M. Zollhofer,; H. Kim,; P. Garrido,; F. Bernard,; P. Perez,; C. Theobalt, MoFA: Model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In: Proceedings of the IEEE International Conference on Computer Vision, 1274-1283, 2017.
DOI
[65]
M. Zollhöfer,; J. Thies,; P. Garrido,; D. Bradley,; T. Beeler,; P. Pérez,; M. Stamminger,; M. Nießner,; C.. Theobalt, State of the art on monocular 3D face reconstruction, tracking, and applications. Computer Graphics Forum Vol. 37, No. 2, 523-550, 2018.
[66]
A. T. Tran,; T. Hassner,; I. Masi,; G. Medioni, Regressing robust and discriminative 3D morphable models with a very deep neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5163-5172, 2017.
DOI
[67]
V. Blanz,; T. Vetter, A morphable model for the synthesis of 3D faces. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, 187-194, 1999.
DOI
[68]
L. Hu,; S. Saito,; L. Wei,; K. Nagano,; J. Seo,; J. Fursund,; I. Sadeghi,; C. Sun,; Y.-C. Chen,; H. Li, Avatar digitization from a single image for real-time rendering. ACM Transactions on Graphics Vol. 36, No. 6, Article No. 195, 2017.
[69]
A. S. Jackson,; A. Bulat,; V. Argyriou,; G. Tzimiropoulos, Large pose 3D face reconstruction from a single image via direct volumetric CNN regression. In: Proceedings of the IEEE International Conference on Computer Vision, 1031-1039, 2017.
DOI
[70]
E. Richardson,; M. Sela,; R. Or-El,; R. Kimmel, Learning detailed face reconstruction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1259-1268, 2017.
DOI
[71]
P. Dou,; S. K. Shah,; I. A. Kakadiaris, End-to-end 3D face reconstruction with deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5908-5917, 2017.
DOI
[72]
H. Kim,; M. Zollhofer,; A. Tewari,; J. Thies,; C. Richardt,; C. Theobalt, InverseFaceNet: Deep monocular inverse face rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4625-4634, 2018.
DOI
[73]
A. T. Tran,; T. Hassner,; I. Masi,; E. Paz,; Y. Nirkin,; G. G. Medioni, Extreme 3D face reconstruction: Seeing through occlusions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3935-3944, 2018.
DOI
[74]
B. Gecer,; S. Ploumpis,; I. Kotsia,; S. Zafeiriou, GANFIT: Generative adversarial network fitting for high fidelity 3D face reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1155-1164, 2019.
DOI
[75]
S. Lombardi,; J. Saragih,; T. Simon,; Y. Sheikh, Deep appearance models for face rendering. ACM Transactions on Graphics Vol. 37, No. 4, Article No. 68, 2018.
[76]
P. F. Dou,; I. A. Kakadiaris, Multi-view 3D face reconstruction with deep recurrent neural networks. Image and Vision Computing Vol. 80, 80-91, 2018.
[77]
F. Wu,; L. Bao,; Y. Chen,; Y. Ling,; Y. Song,; S. Li,; K. N. Ngan,; W. Liu, MVF-Net: Multi-view 3D face morphable model regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 959-968, 2019.
DOI
[78]
Y. P. Cao,; Z. N. Liu,; Z. F. Kuang,; L. Kobbelt,; S.M. Hu, Learning to reconstruct high-quality 3D shapes with cascaded fully convolutional networks. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11213. V. Ferrari,; M. Hebert,; C. Sminchisescu,; Y. Weiss, Eds. Springer Cham, 626-643, 2018.
[79]
Z. Huang,; T. Y. Li,; W. K. Chen,; Y. J. Zhao,; J. Xing,; C. LeGendre,; L. Luo,; C. Ma,; H. Li, Deep volumetric video from very sparse multi-view performance capture. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11220. V. Ferrari,; M. Hebert,; C. Sminchisescu,; Y. Weiss, Eds. Springer Cham, 351-369, 2018.
DOI
[80]
Z. Zheng,; T. Yu,; Y. Wei,; Q. Dai,; Y. Liu, DeepHuman: 3D human reconstruction from a single image. In: Proceedings of the IEEE International Conference on Computer Vision, 7739-7749, 2019.
DOI
[81]
S. Saito,; Z. Huang,; R. Natsume,; S. Morishima,; H. Li,; A. Kanazawa, PIFu: Pixel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2304-2314, 2019.
DOI
[82]
L. Gao,; J. Yang,; Y. L. Qiao,; Y. K. Lai,; P. L. Rosin,; W. W. Xu,; S. Xia, Automatic unpaired shape deformation transfer. ACM Transactions on Graphics Vol. 37, No. 6, Article No. 237, 2018.
[83]
Q. Tan,; L. Gao,; Y.-K. Lai,; J. Yang,; S. Xia, Mesh-based autoencoders for localized deformation component analysis. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018.
[84]
L. Gao,; Y. K. Lai,; J. Yang,; L. X. Zhang,; S. H. Xia,; L. Kobbelt, Sparse data driven mesh deformation. IEEE Transactions on Visualization and Computer Graphics , 2019.
[85]
H.-Y. Meng,; L. Gao,; Y.-K. Lai,; D. Manocha, VV-Net: Voxel VAE net with group convolutions for point cloud segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, 8500-8508, 2019.
DOI
[86]
Z. Wu,; X. Wang,; D. Lin,; D. Lischinski,; D. Cohen-Or,; H. Huang, SAGNet: Structure-aware generative network for 3D-shape modeling. ACM Transactions on Graphics Vol. 38, No. 4, Article No. 91, 2019.
[87]
K. Yin,; Z. Chen,; H. Huang,; D. Cohen-Or,; H. Zhang, LOGAN: Unpaired shape transform in latent overcomplete space. ACM Transactions on Graphics Vol. 38, No. 6, Article No. 198, 2019.
[88]
L. Gao,; J. Yang,; T. Wu,; Y.-J. Yuan,; H. Fu,; Y.-K. Lai,; H. Zhang, SDM-NET: Deep generative network for structured deformable mesh. ACM Transactions on Graphics Vol. 38, No. 6, Article No. 243, 2019.
[89]
Q. Fu,; X. W. Chen,; X. T. Wang,; S. J. Wen,; B. Zhou,; H. B. Fu, Adaptive synthesis of indoor scenes via activity-associated object relation graphs. ACM Transactions on Graphics Vol. 36, No. 6, Article No. 201, 2017.
[90]
K. Wang,; M. Savva,; A. X. Chang,; D. Ritchie, Deep convolutional priors for indoor scene synthesis. ACM Transactions on Graphics Vol. 37, No. 4, Article No. 70, 2018.
[91]
S. Song,; F. Yu,; A. Zeng,; A. X. Chang,; M. Savva,; T. Funkhouser, Semantic scene completion from a single depth image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1746-1754, 2017.
DOI
[92]
M. Li,; A. G. Patil,; K. Xu,; S. Chaudhuri,; O. Khan,; A. Shamir,; C. Tu,; B. Chen,; D. Cohen-Or,; H. Zhang, Grains: Generative recursive autoencoders for indoor scenes. ACM Transactions on Graphics Vol. 38, No. 2, Article No. 12, 2019.
[93]
J. Li,; K. Xu,; S. Chaudhuri,; E. Yumer,; H. Zhang,; L. Guibas, GRASS: Generative recursive autoencoders for shape structures. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 52, 2017.
[94]
W. M. Wu,; X. M. Fu,; R. Tang,; Y. H. Wang,; Y. H. Qi,; L. G. Liu, Data-driven interior plan generation for residential buildings. ACM Transactions on Graphics Vol. 38, No. 6, Article No. 234, 2019.
[95]
D. Ritchie,; K. Wang,; Y.-A. Lin, Fast and flexible indoor scene synthesis via deep convolutional generative models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6182-6190, 2019.
DOI
[96]
X. Zhao,; R. Z. Hu,; H. S. Liu,; T. Komura,; X. Y. Yang, Localization and completion for 3D object interactions. IEEE Transactions on Visualization and Computer Graphics , 2019.
[97]
R. Z. Hu,; Z. H. Yan,; J. W. Zhang,; O. van Kaick,; A. Shamir,; H. Zhang,; H. Huang, Predictive and generative neural networks for object functionality. ACM Transactions on Graphics Vol. 37, No. 4, Article No. 151, 2018.
[98]
Z. Yan,; R. Hu,; X. Yan,; L. Chen,; O. Van Kaick,; H. Zhang,; H. Huang, RPM-Net: Recurrent prediction of motion and parts from point cloud. ACM Transactions on Graphics Vol. 38, No. 6, Article No. 240, 2019.
[99]
É. Guérin,; J. Digne,; É. Galin,; A. Peytavie,; C. Wolf,; B. Benes,; B. Martinez, Interactive example-based terrain authoring with conditional generative adversarial networks. ACM Transactions on Graphics Vol. 36, No. 6, Article No. 228, 2017.
[100]
J. Zhang,; C. B. Wang,; C. Li,; H. Qin, Example-based rapid generation of vegetation on terrain via CNN-based distribution learning. The Visual Computer Vol. 35, Nos. 6-8, 1181-1191, 2019.
[101]
Y.-C. Su,; K. Grauman, Learning spherical convolution for fast features from 360 imagery. In: Proceedings of the Advances in Neural Information Processing Systems 30, 529-539, 2017.
[102]
Z. H. Zhang,; Y. Y. Xu,; J. Y. Yu,; S. H. Gao, Saliency detection in 360 videos. In: Proceedings of the European Conference on Computer Vision, 488-503, 2018.
DOI
[103]
B. Coors,; A. P. Condurache,; A. Geiger, SphereNet: Learning spherical representations for detection and classification in omnidirectional images. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11213. V. Ferrari,; M. Hebert,; C. Sminchisescu,; Y. Weiss, Eds. Springer Cham, 518-533, 2018.
[104]
J. Li,; J. M. Su,; C. Q. Xia,; Y. H. Tian, Distortion-adaptive salient object detection in 360 omnidirectional images. IEEE Journal of Selected Topics in Signal Processing Vol. 14, No. 1, 38-48, 2020.
[105]
Y.-C. Su; K. Grauman, Kernel transformer networks for compact spherical convolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 9442-9451, 2019.
DOI
[106]
R. Monroy,; S. Lutz,; T. Chalasani,; A. Smolic, SalNet360: Saliency maps for omni-directional images with CNN. Signal Processing: Image Communication Vol. 69, 26-34, 2018.
[107]
H.-T. Cheng,; C.-H. Chao,; J.-D. Dong,; H.-K. Wen,; T.-L. Liu,; M. Sun, Cube padding for weakly-supervised saliency prediction in 360 videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1420-1429, 2018.
DOI
[108]
W. Yang,; Y. Qian,; J.-K. Kämäräinen,; F. Cricri,; L. Fan, Object detection in equirectangular panorama. In: Proceedings of the 24th International Conference on Pattern Recognition, 2190-2195, 2018.
DOI
[109]
J. Redmon; A. Farhadi, YOLO9000: Better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7263-7271, 2017.
DOI
[110]
Y. Lee,; J. Jeong,; J. Yun,; W. Cho,; K.-J. Yoon, SpherePHD: Applying CNNs on a spherical polyhedron representation of 360deg images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 9181-9189, 2019.
DOI
[111]
C. Zou,; A. Colburn,; Q. Shan,; D. Hoiem, LayoutNet: Reconstructing the 3D room layout from a single RGB image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2051-2059, 2018.
DOI
[112]
C. Sun,; C. W. Hsiao,; M. Sun,; H. T. Chen, HorizonNet: Learning room layout with 1D representation and pano stretch data augmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1047-1056, 2019.
DOI
[113]
S.-T. Yang,; F.-E. Wang,; C.-H. Peng,; P. Wonka,; M. Sun,; H.-K. Chu, DuLa-Net: A dual-projection network for estimating room layouts from a single RGB panorama. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3363-3372, 2019.
DOI
[114]
J. Kim,; W. Kim,; H. Oh,; S. Lee,; S. Lee, A deep cybersickness predictor based on brain signal analysis for virtual reality contents. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 10580-10589, 2019.
DOI
[115]
E. M. Kolasinski, Simulator sickness in virtual environments. Technical Report. Army Research Inst for the Behavioral and Social Sciences Alexandria VA, 1995.
[116]
M. Wang,; X. J. Zhang,; J. B. Liang,; S. H. Zhang,; R. R. Martin, Comfort-driven disparity adjustment for stereoscopic video. Computational Visual Media Vol. 2, No. 1, 3-17, 2016.
[117]
Y. H. Yu,; P. C. Lai,; L. W. Ko,; C. H. Chuang,; B. C. Kuo,; C. T. Lin, An EEG-based classification system of Passenger’s motion sickness level by using feature extraction/selection technologies. In: Proceedings of the International Joint Conference on Neural Networks, 1-6, 2010.
DOI
[118]
D. Jeong,; S. Yoo,; J. Yun, Cybersickness analysis with EEG using deep learning algorithms. In: Proceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces, 827-835, 2019.
DOI
[119]
T. M. Lee,; J. C. Yoon,; I. K. Lee, Motion sickness prediction in stereoscopic videos using 3D convolutional neural networks. IEEE Transactions on Visualization and Computer Graphics Vol. 25, No. 5, 1919-1927, 2019.
[120]
Y. Y. Wang,; J. R. Chardonnet,; F. Merienne, VR sickness prediction for navigation in immersive virtual environments using a deep long short term memory model. In: Proceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces, 1874-1881, 2019.
DOI
[121]
P. Hu,; Q. Sun,; P. Didyk,; L. Y. Wei,; A. E. Kaufman, Reducing simulator sickness with perceptual camera control. ACM Transactions on Graphics Vol. 38, No. 6, Article No. 210, 2019.
[122]
W. J. Gong,; X. N. Zhang,; J. Gonzàlez,; A. Sobral,; T. Bouwmans,; C. H. Tu,; E.-h. Zahzah, Human pose estimation from monocular images: A comprehensive survey. Sensors Vol. 16, No. 12, 1966, 2016.
[123]
A. Toshev,; C. Szegedy, DeepPose: Human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1653-1660, 2014.
DOI
[124]
A. Newell,; K. Y. Yang,; J. Deng, Stacked hourglass networks for human pose estimation. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9912. B. Leibe,; J. Matas,; N. Sebe,; M. Welling, Eds. Springer Cham, 483-499, 2016.
[125]
L. Pishchulin,; E. Insafutdinov,; S. Y. Tang,; B. Andres,; M. Andriluka,; P. Gehler,; B. Schiele, DeepCut: Joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4929-4937, 2016.
DOI
[126]
Z. Cao,; T. Simon,; S.-E. Wei,; Y. Sheikh, Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7291-7299, 2017.
DOI
[127]
H.-S. Fang,; S. Xie,; Y.-W. Tai,; C. Lu, RMPE: Regional multi-person pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, 2334-2343, 2017.
DOI
[128]
S. Jin,; W. Liu,; W. Ouyang,; C. Qian, Multi-person articulated tracking with spatial and temporal embeddings. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5664-5673, 2019.
DOI
[129]
F. Bogo,; A. Kanazawa,; C. Lassner,; P. Gehler,; J. Romero,; M. J. Black, Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9909. B. Leibe,; J. Matas,; N. Sebe,; M. Welling, Eds. Springer Cham, 561-578, 2016.
DOI
[130]
M. Loper,; N. Mahmood,; J. Romero,; G. Pons-Moll,; M. J. Black, SMPL: A skinned multi-person linear model. ACM Transactions on Graphics Vol. 34, No. 6, Article No. 248, 2015.
[131]
D. Mehta,; S. Sridhar,; O. Sotnychenko,; H. Rhodin,; M. Shafiei,; H.-P. Seidel,; W. Xu,; D. Casas,; C. Theobalt, VNect: Real-time 3D human pose estimation with a single RGB camera. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 44, 2017.
[132]
D. Tome,; C. Russell,; L. Agapito, Lifting from the deep: Convolutional 3D pose estimation from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2500-2509, 2017.
DOI
[133]
B. Wandt,; B. Rosenhahn, RepNet: Weakly supervised training of an adversarial reprojection network for 3D human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7782-7791, 2019.
DOI
[134]
Y. Cheng,; B. Yang,; B. Wang,; W. Yan,; R. T. Tan, Occlusion-aware networks for 3D human pose estimation in video. In: Proceedings of the IEEE International Conference on Computer Vision, 723-732, 2019.
DOI
[135]
M. Oberweger,; P. Wohlhart,; V. Lepetit, Hands deep in deep learning for hand pose estimation. arXiv preprint arXiv:1502.06807, 2015.
[136]
X. Zhou,; Q. Wan,; W. Zhang,; X. Xue,; Y. Wei, Model-based deep hand pose estimation. arXiv preprint arXiv:1606.06854, 2016.
[137]
D. Pavllo,; T. Porssut,; B. Herbelin,; R. Boulic, Real-time marker-based finger tracking with neural networks. In: Proceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces, 651-652, 2018.
DOI
[138]
T. Chalasani,; J. Ondrej,; A. Smolic, Egocentric gesture recognition for head-mounted AR devices. In: Proceedings of the IEEE International Symposium on Mixed and Augmented Reality Adjunct, 109-114, 2018.
DOI
[139]
L. Ge,; Z. Ren,; Y. Li,; Z. Xue,; Y. Wang,; J. Cai,; J. Yuan, 3D hand shape and pose estimation from a single RGB image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 10833-10842, 2019.
DOI
[140]
A. M. Soccini, Gaze estimation based on head movements in virtual reality applications using deep learning. In: Proceedings of the IEEE Virtual Reality, 413-414, 2017.
DOI
[141]
Y. Xu,; Y. Dong,; J. Wu,; Z. Sun,; Z. Shi,; J. Yu,; S. Gao, Gaze prediction in dynamic 360 immersive videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5333-5342, 2018.
DOI
[142]
Y. Cheng,; S. Huang,; F. Wang,; C. Qian,; F. Lu, A coarse-to-fine adaptive network for appearance-based gaze estimation. arXiv preprint arXiv:2001.00187, 2020.
DOI
[143]
F. Lu,; Y. Gao,; X. W. Chen, Estimating 3D gaze directions using unlabeled eye images via synthetic iris appearance fitting. IEEE Transactions on Multimedia Vol. 18, No. 9, 1772-1782, 2016.
[144]
Y. H. Cheng,; F. Lu,; X. C. Zhang, Appearance-based gaze estimation via evaluation-guided asymmetric regression. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11218. V. Ferrari,; M. Hebert,; C. Sminchisescu,; Y. Weiss, Eds. Springer Cham, 105-121, 2018.
[145]
Y. Xiong,; H. J. Kim,; V. Singh, Mixed effects neural networks (MeNets) with applications to gaze estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7743-7752, 2019.
DOI
[146]
P. Isola,; J.-Y. Zhu,; T. Zhou,; A. A. Efros, Image-toimage translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1125-1134, 2017.
DOI
[147]
J. Y. Zhu,; T. Park,; P. Isola,; A. A. Efros, Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2223-2232, 2017.
DOI
[148]
Y. Choi,; M. Choi,; M. Kim,; J. W. Ha,; S. Kim,; J. Choo, StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8789-8797, 2018.
DOI
[149]
J. Yu,; Z. Lin,; J. Yang,; X. Shen,; X. Lu,; T. S. Huang, Generative image inpainting with contextual attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5505-5514, 2018.
DOI
[150]
Y. Li,; S. Liu,; J. Yang,; M.-H. Yang, Generative face completion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3911-3919, 2017.
DOI
[151]
X. Wu,; R. L. Li,; F. L. Zhang,; J. C. Liu,; J. Wang,; A. Shamir,; S.-M. Hu, Deep portrait image completion and extrapolation. IEEE Transactions on Image Processing Vol. 29, 2344-2355, 2020.
[152]
X. Wu,; K. Xu,; P. Hall, A survey of image synthesis and editing with generative adversarial networks. Tsinghua Science and Technology Vol. 22, No. 6, 660-674, 2017.
[153]
H.-N. Hu,; Y.-C. Lin,; M.-Y. Liu,; H.-T. Cheng,; Y.-J. Chang,; M. Sun, Deep 360 pilot: Learning a deep agent for piloting through 360 sports videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1396-1405, 2017.
DOI
[154]
W. S. Lai,; Y. J. Huang,; N. Joshi,; C. Buehler,; M. H. Yang,; S. B. Kang, Semantic-driven generation of hyperlapse from 360 degree video. IEEE Transactions on Visualization and Computer Graphics Vol. 24, No. 9, 2610-2621, 2018.
[155]
Y. Yu,; S. Lee,; J. Na,; J. Kang,; G. Kim, A deep ranking model for spatio-temporal highlight detection from a 360 video. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018.
[156]
S. Lee,; J. Sung,; Y. Yu,; G. Kim, A memory network approach for story-based temporal summarization of 360 videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1410-1419, 2018.
DOI
[157]
M. Wang,; X. Wen,; S.-M. Hu, Faithful face image completion for HMD occlusion removal. In: Proceedings of the IEEE International Symposium on Mixed and Augmented Reality Adjunct, 251-256, 2019.
DOI
[158]
J. Thies,; M. Zollhöfer,; M. Stamminger,; C. Theobalt,; M. Nießner, FaceVR: Real-time gaze-aware facial reenactment in virtual reality. ACM Transactions on Graphics Vol. 37, No. 2, Article No. 25, 2018.
[159]
K. Nakano,; D. Horita,; N. Sakata,; K. Kiyokawa,; K. Yanai,; T. Narumi, DeepTaste: Augmented reality gustatory manipulation with GAN-based real-time food-to-food translation. In: Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, 212-223, 2019.
DOI
[160]
M. Levoy,; R. Whitaker, Gaze-directed volume rendering. ACM SIGGRAPH Computer Graphics Vol. 24, No. 2, 217-223, 1990.
[161]
B. Guenter,; M. Finch,; S. Drucker,; D. Tan,; J. Snyder, Foveated 3D graphics. ACM Transactions on Graphics Vol. 31, No. 6, Article No. 164, 2012.
[162]
A. S. Kaplanyan,; A. Sochenov,; T. Leimkühler,; M. Okunev,; T. Goodall,; G. Rufo, DeepFovea: Neural reconstruction for foveated rendering and video compression using learned statistics of natural videos. ACM Transactions on Graphics Vol. 38, No. 6, Article No. 212, 2019.
[163]
H. Kim,; P. Carrido,; A. Tewari,; W. Xu,; J. Thies,; M. Niessner,; P. Pérez,; C. Richardt,; M. Zollhöfer,; C. Theobalt, Deep video portraits. ACM Transactions on Graphics Vol. 37, No. 4, Article No. 163, 2018.
[164]
J. Thies,; M. Zollhofer,; M. Stamminger,; C. Theobalt,; M. NieBner, Face2Face: Real-time face capture and reenactment of RGB videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2387-2395, 2016.
DOI
[165]
R. W. Sumner,; J. Popović, Deformation transfer for triangle meshes. ACM Transactions on Graphics Vol. 23, No. 3, 399-405, 2004.
[166]
K. Olszewski,; J. J. Lim,; S. Saito,; H. Li, High-fidelity facial and speech animation for VR HMDs. ACM Transactions on Graphics Vol. 35, No. 6, Article No. 221, 2016.
[167]
S. Suwajanakorn,; S. M. Seitz,; I. Kemelmacher-Shlizerman Synthesizing Obama: Learning lip sync from audio. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 95, 2017.
[168]
W. Wu,; Y. X. Zhang,; C. Li,; C. Qian,; C. C. Loy, ReenactGAN: Learning to reenact faces via boundary transfer. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11205. V. Ferrari,; M. Hebert,; C. Sminchisescu,; Y. Weiss, Eds. Springer Cham, 622-638, 2018.
[169]
Y. Nirkin,; Y. Keller,; T. Hassner FSGAN: Subject agnostic face swapping and reenactment. In: Proceedings of the IEEE International Conference on Computer Vision, 7184-7193, 2019.
DOI
[170]
J. H. Geng,; T. J. Shao,; Y. Y. Zheng,; Y. L. Weng,; K. Zhou, Warp-guided GANs for single-photo facial animation. ACM Transactions on Graphics Vol. 37, No. 6, Article No. 231, 2019.
[171]
H. Kim,; M. Elgharib,; M. Zollhöfer,; H. P. Seidel,; T. Beeler,; C. Richardt,; C. Theobalt, Neural style-preserving visual dubbing. ACM Transactions on Graphics Vol. 38, No. 6, Article No. 178, 2019.
[172]
J. W. Huang,; Z. L. Chen,; D. Ceylan,; H. L. Jin, 6-DOF VR videos with a single 360-camera. In: Proceedings of the IEEE Virtual Reality, 37-44, 2017.
DOI
[173]
A. Serrano,; I. Kim,; Z. L. Chen,; S. DiVerdi,; D. Gutierrez,; A. Hertzmann,; B. Masia, Motion parallax for 360 RGBD video. IEEE Transactions on Visualization and Computer Graphics Vol. 25, No. 5, 1817-1827, 2019.
[174]
T. Park,; M.-Y. Liu,; T.-C. Wang,; J.-Y. Zhu, Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2337-2346, 2019.
DOI
[175]
Z. Wu,; S. Pan,; F. Chen,; G. Long,; C. Zhang,; P. S. Yu, A comprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596, 2019.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Revised: 30 January 2020
Accepted: 02 March 2020
Published: 23 March 2020
Issue date: March 2020

Copyright

© The author(s) 2020

Acknowledgements

The authors would like to thank the reviewers. This work was supported by the National Natural Science Foundation of China (Grant Nos. 61902012, 61932003). Fang-Lue Zhang was supported by a Victoria Early-Career Research Excellence Award.

Rights and permissions

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www. editorialmanager.com/cvmj.

Return