AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

SG-NeRF: Sparse-Input Generalized Neural Radiance Fields for Novel View Synthesis

School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou 450002, China
Intelligent Big Data System (iBDSys) Lab, Shanghai Jiao Tong University, Shanghai 200240, China
School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China
Show Author Information

Abstract

Traditional neural radiance fields for rendering novel views require intensive input images and pre-scene optimization, which limits their practical applications. We propose a generalization method to infer scenes from input images and perform high-quality rendering without pre-scene optimization named SG-NeRF (Sparse-Input Generalized Neural Radiance Fields). Firstly, we construct an improved multi-view stereo structure based on the convolutional attention and multi-level fusion mechanism to obtain the geometric features and appearance features of the scene from the sparse input images, and then these features are aggregated by multi-head attention as the input of the neural radiance fields. This strategy of utilizing neural radiance fields to decode scene features instead of mapping positions and orientations enables our method to perform cross-scene training as well as inference, thus enabling neural radiance fields to generalize for novel view synthesis on unseen scenes. We tested the generalization ability on DTU dataset, and our PSNR (peak signal-to-noise ratio) improved by 3.14 compared with the baseline method under the same input conditions. In addition, if the scene has dense input views available, the average PSNR can be improved by 1.04 through further refinement training in a short time, and a higher quality rendering effect can be obtained.

References

[1]

Mildenhall B, Srinivasan P P, Tancik M, Barron J T, Ramamoorthi R, Ng R. NeRF: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 2022, 65(1): 99–106. DOI: 10.1145/3503250.

[2]
Yu A, Ye V, Tancik M, Kanazawa A. pixelNeRF: Neural radiance fields from one or few images. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.4576–4585. DOI: 10.1109/CVPR46437.2021.00455.
[3]
Trevithick A, Yang B. GRF: Learning a general radiance field for 3D representation and rendering. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.15162–15172. DOI: 10.1109/ICCV48922.2021.01490.
[4]
Li J X, Feng Z J, She Q, Ding H H, Wang C H, Lee G H. MINE: Towards continuous depth MPI with NeRF for novel view synthesis. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.12558–12568. DOI: 10.1109/ICCV48922.2021.01235.
[5]
Deng K D, Liu A, Zhu J Y, Ramanan D. Depth-supervised NeRF: Fewer views and faster training for free. In Proc. the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2022, pp.12872–12881. DOI: 10.1109/CVPR52688.2022.01254.
[6]
Jain A, Tancik M, Abbeel P. Putting NeRF on a diet: Semantically consistent few-shot view synthesis. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.5865–5874. DOI: 10.1109/ICCV48922.2021.00583.
[7]
Chen A P, Xu Z X, Zhao F Q, Zhang X S, Xiang F B, Yu J Y, Su H. MVSNeRF: Fast generalizable radiance field reconstruction from multi-view stereo. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.14104–14113. DOI: 10.1109/ICCV48922.2021.01386.
[8]
Johari M M, Lepoittevin Y, Fleuret F. GeoNeRF: Generalizing NeRF with geometry priors. In Proc. the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2022, pp.18344–18347. DOI: 10.1109/CVPR52688.2022.01782.
[9]
Jensen R, Dahl A, Vogiatzis G, Tola E, Aanæs H. Large scale multi-view stereopsis evaluation. In Proc. the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2014, pp.406–413. DOI: 10.1109/CVPR.2014.59.
[10]
De Bonet J S, Viola P. Poxels: Probabilistic voxelized volume reconstruction. In Proc. International Conference on Computer Vision, Sept. 1999, p.2.
[11]

Furukawa Y, Ponce J. Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Analysis and Machine Intelligence, 2010, 32(8): 1362–1376. DOI: 10.1109/TPAMI.2009.161.

[12]
Kolmogorov V, Zabih R. Multi-camera scene reconstruction via graph cuts. In Proc. the 7th European Conference on Computer Vision, Copenhagen, May 2002, pp.82–96. DOI: 10.1007/3-540-47977-5_6.
[13]
Schönberger J L, Zheng E L, Frahm J M, Pollefeys M. Pixelwise view selection for unstructured multi-view stereo. In Proc. the 14th European Conference on Computer Vision, Oct. 2016, pp.501–518. DOI: 10.1007/978-3-319-46487-9_31.
[14]
Yao Y, Luo Z X, Li S W, Fang T, Quan L. MVSNet: Depth inference for unstructured multi-view stereo. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.785–801. DOI: 10.1007/978-3-030-01237-3_47.
[15]
Yao Y, Luo Z X, Li S W, Shen T W, Fang T, Quan L. Recurrent MVSNet for high-resolution multi-view stereo depth inference. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.5520–5529. DOI: 10.1109/CVPR.2019.00567.
[16]
Cheng S, Xu Z X, Zhu S L, Li Z W, Li L E, Ramamoorthi R, Su H. Deep stereo using adaptive thin volume representation with uncertainty awareness. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.2521–2531. DOI: 10.1109/CVPR42600.2020.00260.
[17]
Gu X D, Fan Z W, Zhu S Y, Dai Z Z, Tan F T, Tan P. Cascade cost volume for high-resolution multi-view stereo and stereo matching. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.2492–2501. DOI: 10.1109/CVPR42600.2020.00257.
[18]
Yang J Y, Mao W, Alvarez J M, Liu M M. Cost volume pyramid based depth inference for multi-view stereo. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, May 2020, pp.4876–4885. DOI: 10.1109/CVPR42600.2020.00493.
[19]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.6000–6010. DOI: 10.5555/3295222.3295349.
[20]

Guo M H, Xu T X, Liu J J, Liu Z N, Jiang P T, Mu T J, Zhang S H, Martin R R, Cheng M M, Hu S M. Attention mechanisms in computer vision: A survey. Computational Visual Media, 2022, 8(3): 331–368. DOI: 10.1007/s41095- 022-0271-y.

[21]

Shmatko A, Ghaffari Laleh N, Gerstung M, Kather J N. Artificial intelligence in histopathology: Enhancing cancer research and clinical oncology. Nature Cancer, 2022, 3(9): 1026–1038. DOI: 10.1038/s43018-022-00436-4.

[22]
Li Y H, Mao H Z, Girshick R, He K M. Exploring plain vision transformer backbones for object detection. In Proc. the 17th European Conference on Computer Vision, Oct. 2022, pp.280–296. DOI: 10.1007/978-3-031-20077-9_17.
[23]

Kalantari N K, Wang T C, Ramamoorthi R. Learning-based view synthesis for light field cameras. ACM Trans. Graphics, 2016, 35(6): Article No. 193. DOI: 10.1145/2980179.2980251.

[24]
Srinivasan P P, Wang T Z, Sreelal A, Ramamoorthi R, Ng R. Learning to synthesize a 4D RGBD light field from a single image. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.2262–2270. DOI: 10.1109/ICCV.2017.246.
[25]

Chen A P, Wu M Y, Zhang Y L, Li N Y, Lu J, Gao S H, Yu J Y. Deep surface light fields. Proceedings of the ACM on Computer Graphics and Interactive Techniques, 2018, 1(1): 14. DOI: 10.1145/3203192.

[26]

Chaurasia G, Duchene S, Sorkine-Hornung O, Drettakis G. Depth synthesis and local warps for plausible image-based navigation. ACM Trans. Graphics, 2013, 32(3): 30. DOI: 10.1145/2487228.2487238.

[27]
Chaurasia G, Sorkine O, Drettakis G. Silhouette-aware warping for image-based rendering. In Proc. of the 22nd Eurographics conference on Rendering, Jun. 2011, pp.1223–1232. DOI: 10.1111/j.1467-8659.2011.01981.x.
[28]
Sinha S N, Steedly D, Szeliski R. Piecewise planar stereo for image-based rendering. In Proc. the 12th IEEE International Conference on Computer Vision, Sept. 29–Oct. 2, 2009, pp.1881–1888. DOI: 10.1109/ICCV.2009.5459417.
[29]

Zhou T H, Tucker R, Flynn J, Fyffe G, Snavely N. Stereo magnification: Learning view synthesis using multiplane images. ACM Trans. Graphics, 2018, 37(4): 65. DOI: 10.1145/3197517.3201323.

[30]
Choi I, Gallo O, Troccoli A, Kim M H, Kautz J. Extreme view synthesis. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27–Nov. 2, 2019, pp.7780–7789. DOI: 10.1109/ICCV.2019.00787.
[31]

Mildenhall B, Srinivasan P P, Ortiz-Cayon R, Kalantari N K, Ramamoorthi R, Ng R, Kar A. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Trans. Graphics, 2019, 38(4): 29. DOI: 10.1145/3306346.3322980.

[32]
Srinivasan P P, Tucker R, Barron J T, Ramamoorthi R, Ng R, Snavely N. Pushing the boundaries of view extrapolation with multiplane images. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.175–184. DOI: 10.1109/CVPR.2019.00026.
[33]
Huang J W, Thies J, Dai A G L, Kundu A, Jiang C Y, Guibas L J, Nießner M, Funkhouser T. Adversarial texture optimization from RGB-D scans. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.1556–1565. DOI: 10.1109/CVPR42600.2020.00163.
[34]
Aliev K A, Sevastopolsky A, Kolos M, Ulyanov D, Lempitsky V. Neural point-based graphics. In Proc. the 16th European Conference on Computer Vision, Aug. 2020, pp.696–712. DOI: 10.1007/978-3-030-58542-6_42.
[35]
Meshry M, Goldman D B, Khamis S, Hoppe H, Pandey R, Snavely N, Martin-Brualla R. Neural rerendering in the wild. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.6871–6880. DOI: 10.1109/CVPR.2019.00704.
[36]
Barron J T, Mildenhall B, Tancik M, Hedman P, Martin-Brualla R, Srinivasan P P. MIP-NeRF: A multiscale representation for anti-aliasing neural radiance fields. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.5835–5844. DOI: 10.1109/ICCV48922.2021.00580.
[37]
DeVries T, Bautista M A, Srivastava N, Taylor G W, Susskind J M. Unconstrained scene generation with locally conditioned radiance fields. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.14284–14293. DOI: 10.1109/ICCV48922.2021.01404.
[38]
Martin-Brualla R, Radwan N, Sajjadi M S M, Barron J T, Dosovitskiy A, Duckworth D. NeRF in the wild: Neural radiance fields for unconstrained photo collections. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.7206–7215. DOI: 10.1109/CVPR46437.2021.00713.
[39]
Wang Q Q, Wang Z C, Genova K, Srinivasan P, Zhou H, Barron J T, Martin-Brualla R, Snavely N, Funkhouser T. IBRNet: Learning multi-view image-based rendering. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Jun. 2021, pp.4688–4697. DOI: 10.1109/CVPR46437.2021.00466.
[40]
Varma T M, Wang P H, Chen X X, Chen T L, Venugopalan S, Wang Z Y. Is attention all that NeRF needs? In Proc. the 11th International Conference on Learning Representations, May 2023.
[41]

Max N. Optical models for direct volume rendering. IEEE Trans. Visualization and Computer Graphics, 1995, 1(2): 99–108. DOI: 10.1109/2945.468400.

[42]
Schönberger J L, Frahm J M. Structure-from-motion revisited. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.4104–4113. DOI: 10.1109/CVPR.2016.445.
[43]

Wang Z, Bovik A C, Sheikh H R, Simoncelli E P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Processing, 2004, 13(4): 600–612. DOI: 10.1109/TIP.2003.819861.

[44]
Zhang R, Isola P, Efros A A, Shechtman E, Wang O. The unreasonable effectiveness of deep features as a perceptual metric. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.586–595. DOI: 10.1109/CVPR.2018.00068.
Journal of Computer Science and Technology
Pages 785-797
Cite this article:
Xu K, Li J, Li Z-Q, et al. SG-NeRF: Sparse-Input Generalized Neural Radiance Fields for Novel View Synthesis. Journal of Computer Science and Technology, 2024, 39(4): 785-797. https://doi.org/10.1007/s11390-024-4157-6

137

Views

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 29 January 2024
Accepted: 29 March 2024
Published: 26 June 2024
© Institute of Computing Technology, Chinese Academy of Sciences 2024
Return