Discover the SciOpen Platform and Achieve Your Research Goals with Ease.
Search articles, authors, keywords, DOl and etc.
Traditional neural radiance fields for rendering novel views require intensive input images and pre-scene optimization, which limits their practical applications. We propose a generalization method to infer scenes from input images and perform high-quality rendering without pre-scene optimization named SG-NeRF (Sparse-Input Generalized Neural Radiance Fields). Firstly, we construct an improved multi-view stereo structure based on the convolutional attention and multi-level fusion mechanism to obtain the geometric features and appearance features of the scene from the sparse input images, and then these features are aggregated by multi-head attention as the input of the neural radiance fields. This strategy of utilizing neural radiance fields to decode scene features instead of mapping positions and orientations enables our method to perform cross-scene training as well as inference, thus enabling neural radiance fields to generalize for novel view synthesis on unseen scenes. We tested the generalization ability on DTU dataset, and our PSNR (peak signal-to-noise ratio) improved by 3.14 compared with the baseline method under the same input conditions. In addition, if the scene has dense input views available, the average PSNR can be improved by 1.04 through further refinement training in a short time, and a higher quality rendering effect can be obtained.
Mildenhall B, Srinivasan P P, Tancik M, Barron J T, Ramamoorthi R, Ng R. NeRF: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 2022, 65(1): 99–106. DOI: 10.1145/3503250.
Furukawa Y, Ponce J. Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Analysis and Machine Intelligence, 2010, 32(8): 1362–1376. DOI: 10.1109/TPAMI.2009.161.
Guo M H, Xu T X, Liu J J, Liu Z N, Jiang P T, Mu T J, Zhang S H, Martin R R, Cheng M M, Hu S M. Attention mechanisms in computer vision: A survey. Computational Visual Media, 2022, 8(3): 331–368. DOI: 10.1007/s41095- 022-0271-y.
Shmatko A, Ghaffari Laleh N, Gerstung M, Kather J N. Artificial intelligence in histopathology: Enhancing cancer research and clinical oncology. Nature Cancer, 2022, 3(9): 1026–1038. DOI: 10.1038/s43018-022-00436-4.
Kalantari N K, Wang T C, Ramamoorthi R. Learning-based view synthesis for light field cameras. ACM Trans. Graphics, 2016, 35(6): Article No. 193. DOI: 10.1145/2980179.2980251.
Chen A P, Wu M Y, Zhang Y L, Li N Y, Lu J, Gao S H, Yu J Y. Deep surface light fields. Proceedings of the ACM on Computer Graphics and Interactive Techniques, 2018, 1(1): 14. DOI: 10.1145/3203192.
Chaurasia G, Duchene S, Sorkine-Hornung O, Drettakis G. Depth synthesis and local warps for plausible image-based navigation. ACM Trans. Graphics, 2013, 32(3): 30. DOI: 10.1145/2487228.2487238.
Zhou T H, Tucker R, Flynn J, Fyffe G, Snavely N. Stereo magnification: Learning view synthesis using multiplane images. ACM Trans. Graphics, 2018, 37(4): 65. DOI: 10.1145/3197517.3201323.
Mildenhall B, Srinivasan P P, Ortiz-Cayon R, Kalantari N K, Ramamoorthi R, Ng R, Kar A. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Trans. Graphics, 2019, 38(4): 29. DOI: 10.1145/3306346.3322980.
Max N. Optical models for direct volume rendering. IEEE Trans. Visualization and Computer Graphics, 1995, 1(2): 99–108. DOI: 10.1109/2945.468400.
Wang Z, Bovik A C, Sheikh H R, Simoncelli E P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Processing, 2004, 13(4): 600–612. DOI: 10.1109/TIP.2003.819861.