Journal Home > Volume 9 , Issue 3

Learning-based approaches have made sub-stantial progress in capturing spatially-varying bidi-rectional reflectance distribution functions (SVBRDFs) from a single image with unknown lighting and geometry. However, most existing networks only con-sider per-pixel losses which limit their capability to recover local features such as smooth glossy regions. A few generative adversarial networks use multiple discriminators for different parameter maps, increasing network complexity. We present a novel end-to-end generative adversarial network (GAN) to recover appearance from a single picture of a nearly-flat surface lit by flash. We use a single unified adversarial frame-work for each parameter map. An attention module guides the network to focus on details of the maps. Furthermore, the SVBRDF map loss is combined to prevent paying excess attention to specular highlights. We demonstrate and evaluate our method on both public datasets and real data. Quantitative analysis and visual comparisons indicate that our method achieves better results than the state-of-the-art in most cases.


menu
Abstract
Full text
Outline
Electronic supplementary material
About this article

An attention-embedded GAN for SVBRDF recovery from a single image

Show Author's information Zeqi Shi1Xiangyu Lin1Ying Song1( )
Zhejiang Sci-Tech University, 2011 Collaborative InnovationCenter for Garment Personal Customization of Zhejiang Province, Hangzhou 310018, China

Abstract

Learning-based approaches have made sub-stantial progress in capturing spatially-varying bidi-rectional reflectance distribution functions (SVBRDFs) from a single image with unknown lighting and geometry. However, most existing networks only con-sider per-pixel losses which limit their capability to recover local features such as smooth glossy regions. A few generative adversarial networks use multiple discriminators for different parameter maps, increasing network complexity. We present a novel end-to-end generative adversarial network (GAN) to recover appearance from a single picture of a nearly-flat surface lit by flash. We use a single unified adversarial frame-work for each parameter map. An attention module guides the network to focus on details of the maps. Furthermore, the SVBRDF map loss is combined to prevent paying excess attention to specular highlights. We demonstrate and evaluate our method on both public datasets and real data. Quantitative analysis and visual comparisons indicate that our method achieves better results than the state-of-the-art in most cases.

Keywords: attention mechanism, generative adversarial network (GAN), spatially-varying bidirectional reflectance distribution function (SVBRDF), appearance capture

References(43)

[1]
Weyrich, T.; Lawrence, J.; Lensch, H. P. A.; Rusinkiewicz, S.; Zickler, T. Principles of appearance acquisition and representation. Foundations and Trends® in Computer Graphics and Vision Vol. 4, No. 2, 75–191, 2009.
[2]
Dorsey, J.; Rushmeier, H.; Sillion, F. Digital Modeling of Material Appearance. Amsterdam: Elsevier, 2010.
[3]
Weinmann, M.; Klein, R. Advances in geometry and reflectance acquisition (course notes). In: Proceedings of the SIGGRAPH Asia 2015 Courses, Article No. 1, 2015.
DOI
[4]
Guarnera, D.; Guarnera, G. C.; Ghosh, A.; Denk, C.; Glencross, M. BRDF representation and acquisition. Computer Graphics Forum Vol. 35, No. 2, 625–650, 2016.
[5]
Dong, Y. Deep appearance modeling: A survey. Visual Informatics Vol. 3, No. 2, 59–68, 2019.
[6]
Deschaintre, V.; Drettakis, G.; Bousseau, A. Guided fine-tuning for large-scale material transfer. Computer Graphics Forum Vol. 39, No. 4, 91–105, 2020.
[7]
Wang, Z.; Yu, X.; Lu, M.; Wang, Q.; Qian, C.; Xu, F. Single image portrait relighting via explicit multiple reflectance channel modeling. ACM Transactions on Graphics Vol. 39, No. 6, Article No. 220, 2020.
[8]
Li, X.; Dong, Y.; Peers, P.; Tong, X. Modeling surface appearance from a single photograph using self-augmented convolutional neural networks. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 45, 2017.
[9]
Deschaintre, V.; Aittala, M.; Durand, F.; Drettakis, G.; Bousseau, A. Single-image SVBRDF capture with a rendering-aware deep network. ACM Transactions on Graphics Vol. 37, No. 4, Article No. 128, 2018.
[10]
Li, Z.; Sunkavalli, K.; Chandraker, M. Materials for masses: SVBRDF acquisition with a single mobile phone image. In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science, Vol. 11207. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 74–90, 2018.
DOI
[11]
Ye, W. J.; Li, X.; Dong, Y.; Peers, P.; Tong, X. Single image surface appearance modeling with self-augmented CNNs and inexact supervision. Computer Graphics Forum Vol. 37, No. 7, 201–211, 2018.
[12]
Gao, D.; Li, X.; Dong, Y.; Peers, P.; Xu, K.; Tong, X. Deep inverse rendering for high-resolution SVBRDF estimation from an arbitrary number of images. ACM Transactions on Graphics Vol. 38, No. 4, Article No. 134, 2019.
[13]
Guo, Y.; Smith, C.; Hašan, M.; Sunkavalli, K.; Zhao, S. MaterialGAN: Reflectance capture using a generativeSVBRDF model. arXiv preprint arXiv:2010.00114, 2020.
[14]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In: Proceedings of the Advances in Neural Information Processing Systems 27, 2014.
[15]
Zhou, X. L.; Kalantari, N. K. Adversarial single-image SVBRDF estimation with hybrid training. Computer Graphics Forum Vol. 40, No. 2, 315–325, 2021.
[16]
Guo, J.; Lai, S. C.; Tao, C. Z.; Cai, Y. L.; Wang, L.; Guo, Y. W.; Yan, L. Q. Highlight-aware two-stream network for single-image SVBRDF acquisition. ACM Transactions on Graphics Vol. 40, No. 4, Article No. 123, 2021.
[17]
Chandraker, M. On shape and material recovery from motion. In: Computer Vision – ECCV 2014. Lecture Notes in Computer Science, Vol. 8695. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 202–217, 2014.
DOI
[18]
Hui, Z.; Sankaranarayanan, A. C. A dictionary-based approach for estimating shape and spatially-varying reflectance. In: Proceedings of the IEEE International Conference on Computational Photography, 1–9, 2015.
DOI
[19]
Riviere, J.; Peers, P.; Ghosh, A. Mobile surface reflectometry. Computer Graphics Forum Vol. 35, No. 1, 191–202, 2016.
[20]
Xia, R.; Dong, Y.; Peers, P.; Tong, X. Recovering shape and spatially-varying surface reflectance under unknown illumination. ACM Transactions on Graphics Vol. 35, No. 6, Article No. 187, 2016.
[21]
Boivin, S.; Gagalowicz, A. Inverse rendering from a single image. In: Proceedings of the Conference on Colour in Graphics, Imaging, and Vision, 268–277, 2002. Available at https://www.dgp.toronto.edu/∼boivin/pubs/cgiv2002.pdf.
[22]
Aittala, M.; Weyrich, T.; Lehtinen, J. Two-shot SVBRDF capture for stationary materials. ACM Transactions on Graphics Vol. 34, No. 4, Article No. 110, 2015.
[23]
Xu, Z. X.; Nielsen, J. B.; Yu, J. Y.; Jensen, H. W.; Ramamoorthi, R. Minimal BRDF sampling for two-shot near-field reflectance acquisition. ACM Transactions on Graphics Vol. 35, No. 6, Article No. 188, 2016.
[24]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. Lecture Notes in Computer Science, Vol. 9351. Navab, N.; Hornegger, J.; Wells, W.; Frangi, A. Eds. Springer Cham, 234–241, 2015.
DOI
[25]
Deschaintre, V.; Aittala, M.; Durand, F.; Drettakis, G.; Bousseau, A. Flexible SVBRDF capture with a multi-image deep network. Computer Graphics Forum Vol. 38, No. 4, 1–13, 2019.
[26]
Li, Z. Q.; Xu, Z. X.; Ramamoorthi, R.; Sunkavalli, K.; Chandraker, M. Learning to reconstruct shape and spatially-varying reflectance from a single image. ACM Transactions on Graphics Vol. 37, No. 6, Article No. 269, 2018.
[27]
Zhao, Y. Z.; Wang, B. B.; Xu, Y. N.; Zeng, Z.; Wang, L.; Holzschuch, N. Joint SVBRDF recovery and synthesis from a single image using an unsupervised generative adversarial network. In: Proceedings of the Eurographics Symposium on Rendering - DL-only Track, 53–66, 2020.
[28]
Asselin, L. P.; Laurendeau, D.; Lalonde, J. F. Deep SVBRDF estimation on real materials. In: Proceedings of the International Conference on 3D Vision, 1157–1166, 2021.
DOI
[29]
Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. M. Analyzing and improving the image quality of StyleGAN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8107–8116, 2020.
DOI
[30]
Maas, A. L.; Hannun, A. Y.; Ng, A. Y. Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the 30th International Conference on Machine Learning, 3–8, 2013.
[31]
Zhou, B. L.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Object detectors emerge in deep scene CNNs. arXiv preprint arXiv:1412.6856, 2014.
[32]
Isola, P.; Zhu, J. Y.; Zhou, T. H.; Efros, A. A. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5967–5976, 2017.
DOI
[33]
Nguyen, T. V.; Zhao, Q.; Yan, S. C. Attentive systems: A survey. International Journal of Computer Vision Vol. 126, No. 1, 86–110, 2018.
[34]
Chaudhari, S.; Mithal, V.; Polatkan, G.; Ramanath, R. An attentive survey of attention models. ACM Transactions on Intelligent Systems and Technology Vol. 12, No. 5, Article No. 53, 2021.
[35]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
[36]
Chorowski, J.; Bahdanau, D.; Serdyuk, D.; Cho, K.; Bengio, Y. Attention-based models for speech recognition. arXiv preprint arXiv:1506.07503, 2015.
[37]
Mnih, V.; Heess, N.; Graves, A.; Kavukcuoglu, K. Recurrent models of visual attention. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, Vol. 2, 2204–2212, 2014.
[38]
Maini, R.; Aggarwal, H. A comprehensive review of image enhancement techniques. arXiv preprint arXiv:1003.4053, 2010.
[39]
Cook, R. L.; Torrance, K. E. A reflectance model for computer graphics. ACM Transactions on Graphics Vol. 1, No. 1, 7–24, 1982.
[40]
Aittala, M.; Aila, T. M.; Lehtinen, J. Reflectance modeling by neural texture synthesis. ACM Transactions on Graphics Vol. 35, No. 4, Article No. 65, 2016.
[41]
Team T. TensorFlow: Large-scale machine learning on heterogeneous systems. 2015. Available at https://www.tensorflow.org/.
[42]
Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization. In: Proceedings of the 3rd International Conference for Learning Representations, 2015.
[43]
Fu, G.; Zhang, Q.; Zhu, L.; Li, P.; Xiao, C. X. A multi-task network for joint specular highlight detection and removal. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7748–7757, 2021.
DOI
File
41095_0289_ESM.rar (115.1 MB)
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 20 January 2022
Accepted: 20 April 2022
Published: 22 March 2023
Issue date: September 2023

Copyright

© The Author(s) 2023.

Acknowledgements

The authors would like to thank Jie Guo from Nanjing University for his kind help with the comparison. Ying Song was partially supported by the National Natural Science Foundation of China (No. 61602416) and Shaoxing Science and Technology Plan Project (No. 2020B41006).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Return