Journal Home > Volume 9 , Issue 3

Intrinsic image decomposition is an important and long-standing computer vision problem. Given an input image, recovering the physical scene properties is ill-posed. Several physically motivated priors have been used to restrict the solution space of the optimization problem for intrinsic image decomposition. This work takes advantage of deep learning, and shows that it can solve this challenging computer vision problem with high efficiency. The focus lies in the feature encoding phase to extract discriminative features for different intrinsic layers from an input image. To achieve this goal, we explore the distinctive characteristics of different intrinsic components in the high-dimensional feature embedding space. We define feature distribution divergence to efficiently separate the feature vectors of different intrinsic components. The feature distributions are also constrained to fit the real ones through a feature distribution consistency. In addition, a data refinement approach is provided to remove data inconsistency from the Sintel dataset, making it more suitable for intrinsic image decomposition. Our method is also extended to intrinsic video decomposition based on pixel-wise correspondences between adjacent frames. Experimental results indicate that our proposed network structure can outperform the existing state-of-the-art.


menu
Abstract
Full text
Outline
About this article

Discriminative feature encoding for intrinsic image decomposition

Show Author's information Zongji Wang1Yunfei Liu2Feng Lu2,3( )
Key Laboratory of Network Information System Technology (NIST), Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China
State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering,Beihang University, Beijing 100191, China
Peng Cheng Laboratory, Shenzhen 518000, China

Abstract

Intrinsic image decomposition is an important and long-standing computer vision problem. Given an input image, recovering the physical scene properties is ill-posed. Several physically motivated priors have been used to restrict the solution space of the optimization problem for intrinsic image decomposition. This work takes advantage of deep learning, and shows that it can solve this challenging computer vision problem with high efficiency. The focus lies in the feature encoding phase to extract discriminative features for different intrinsic layers from an input image. To achieve this goal, we explore the distinctive characteristics of different intrinsic components in the high-dimensional feature embedding space. We define feature distribution divergence to efficiently separate the feature vectors of different intrinsic components. The feature distributions are also constrained to fit the real ones through a feature distribution consistency. In addition, a data refinement approach is provided to remove data inconsistency from the Sintel dataset, making it more suitable for intrinsic image decomposition. Our method is also extended to intrinsic video decomposition based on pixel-wise correspondences between adjacent frames. Experimental results indicate that our proposed network structure can outperform the existing state-of-the-art.

Keywords: deep learning, intrinsic image decomposition, feature distribution, data refinement

References(60)

[1]
Baslamisli, A. S.; Le, H. A.; Gevers, T. CNN based learning using reflection and retinex models for intrinsic image decomposition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6674–6683, 2018.
DOI
[2]
Land, E. H.; McCann, J. J. Lightness and retinex theory. Journal of the Optical Society of America Vol. 61, No. 1, 1–11, 1971.
[3]
Gehler, P. V.; Rother, C.; Kiefel, M.; Zhang, L. M.; Schölkopf, B. Recovering intrinsic images with a global sparsity prior on reflectance. In: Proceedings of the 24th International Conference on Neural Information Processing Systems, 765–773, 2011.
[4]
Shen, L.; Yeo, C. Intrinsic images decomposition using a local and global sparse representation of reflectance. In: Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition Conference, 697–704, 2011.
DOI
[5]
Shen, L.; Tan, P.; Lin, S. Intrinsic image decomposition with non-local texture cues. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1–7, 2008.
[6]
Zhao, Q.; Tan, P.; Dai, Q.; Shen, L.; Wu, E. H.; Lin, S. A closed-form solution to retinex with nonlocal texture constraints. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 34, No. 7, 1437–1444, 2012.
[7]
Barron, J. T.; Malik, J. Shape, illumination, and reflectance from shading. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 8, 1670–1687, 2015.
[8]
Bousseau, A.; Paris, S.; Durand, F. User-assisted intrinsic images. In: Proceedings of the ACM SIGGRAPH Asia 2009 Papers, Article No. 130, 2009.
DOI
[9]
Shen, J. B.; Yang, X. S.; Li, X. L.; Jia, Y. D. Intrinsic image decomposition using optimization and user scribbles. IEEE Transactions on Cybernetics Vol. 43, No. 2, 425–436, 2013.
[10]
Grosse, R.; Johnson, M. K.; Adelson, E. H.; Freeman, W. T. Ground truth dataset and baseline evaluations for intrinsic image algorithms. In: Proceedings of the IEEE 12th International Conference on Computer Vision, 2335–2342, 2009.
DOI
[11]
Butler, D. J.; Wulff, J.; Stanley, G. B.; Black, M. J. A naturalistic open source movie for optical flow evaluation. In: Computer Vision – ECCV 2012. Lecture Notes in Computer Science, Vol. 7577. Fitzgibbon, A.; Lazebnik, S.; Perona, P.; Sato, Y.; Schmid, C. Eds. Springer Berlin Heidelberg, 611–625, 2012.
DOI
[12]
Bell, S.; Bala, K.; Snavely, N. Intrinsic images in the wild. ACM Transactions on Graphics Vol. 33, No. 4, Article No. 159, 2014.
[13]
Narihira, T.; Maire, M.; Yu, S. X. Learning lightness from human judgement on relative reflectance. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2965–2973, 2015.
DOI
[14]
Narihira, T.; Maire, M.; Yu, S. X. Direct intrinsics: Learning albedo-shading decomposition by convolutional regression. In: Proceedings of the IEEE International Conference on Computer Vision, 2992–2992, 2015.
DOI
[15]
Shi, J.; Dong, Y.; Su, H.; Yu, S. X. Learning non-lambertian object intrinsics across ShapeNet categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5844–5853, 2017.
DOI
[16]
Fan, Q. N.; Yang, J. L.; Hua, G.; Chen, B. Q.; Wipf, D. Revisiting deep intrinsic image decompositions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8944–8952, 2018.
DOI
[17]
Li, Z.; Snavely, N. CGIntrinsics: Better intrinsic image decomposition through physically-based rendering. In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science, Vol. 11207. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 381–399, 2018.
DOI
[18]
Wang, Z. J.; Lu, F. Single image intrinsic decomposition with discriminative feature encoding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop, 4310–4319, 2019.
DOI
[19]
Bonneel, N.; Kovacs, B.; Paris, S.; Bala, K. Intrinsic decompositions for image editing. Computer Graphics Forum Vol. 36, No. 2, 593–609, 2017.
[20]
Weiss, Y. Deriving intrinsic images from image sequences. In: Proceedings of the 8th IEEE Inter-national Conference on Computer Vision, 68–75, 2002.
[21]
Matsushita, Y.; Lin, S.; Kang, S. B.; Shum, H. Y. Estimating intrinsic images from image sequences with biased illumination. In: Computer Vision - ECCV 2004. Lecture Notes in Computer Science, Vol. 3022. Pajdla, T.; Matas, J. Eds. Springer Berlin Heidelberg, 274–286, 2004.
DOI
[22]
Laffont, P. Y.; Bazin, J. C. Intrinsic decomposition of image sequences from local temporal variations. In: Proceedings of the IEEE International Conference on Computer Vision, 433–441, 2015.
DOI
[23]
Li, Z. Q.; Snavely, N. Learning intrinsic image decomposition from watching the world. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9039–9048, 2018.
DOI
[24]
Lettry, L.; Vanhoey, K.; Van Gool, L. Unsupervised deep single-image intrinsic decomposition using illumination-varying image sequences. Computer Graphics Forum Vol. 37, No. 7, 409–419, 2018.
[25]
Gong, W. Y.; Xu, W. H.; Wu, L. Q.; Xie, X. H.; Cheng, Z. L. Intrinsic image sequence decomposition using low-rank sparse model. IEEE Access Vol. 7, 4024–4030, 2018.
[26]
Liu, Y. F.; Lu, F. Separate in latent space: Unsupervised single image layer separation. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 34, No. 7, 11661–11668, 2020.
[27]
Barron, J. T.; Malik, J. Intrinsic scene properties from a single RGB-D image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 17–24, 2013.
DOI
[28]
Chen, Q. F.; Koltun, V. A simple model for intrinsic image decomposition with depth cues. In: Proceedings of the IEEE International Conference on Computer Vision, 241–248, 2013.
DOI
[29]
Lee, K. J.; Zhao, Q.; Tong, X.; Gong, M. M.; Izadi, S.; Lee, S. U.; Tan, P.; Lin, S. Estimation of intrinsic image sequences from Image+Depth video. In: Computer Vision – ECCV 2012. Lecture Notes in Computer Science, Vol. 7577. Fitzgibbon, A.; Lazebnik, S.; Perona, P.; Sato, Y.; Schmid, C. Eds. Springer Berlin Heidelberg, 327–340, 2012.
DOI
[30]
Kim, S.; Park, K.; Sohn, K.; Lin, S. Unified depth prediction and intrinsic image decomposition from a single image via joint convolutional neural fields. In: Computer Vision – ECCV 2016. Lecture Notes in Computer Science, Vol. 9912. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 143–159, 2016.
DOI
[31]
Cheng, Z. A.; Zheng, Y. Q.; You, S. D.; Sato, I. Non-local intrinsic decomposition with near-infrared priors. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2521–2530, 2019.
DOI
[32]
Bi, S.; Han, X. G.; Yu, Y. Z. An L1 image transform for edge-preserving smoothing and scene-level intrinsic decomposition. ACM Transactions on Graphics Vol. 34, No. 4, Article No. 78, 2015.
[33]
Li, Y.; Brown, M. S. Single image layer separation using relative smoothness. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2752–2759, 2014.
DOI
[34]
Sheng, B.; Li, P.; Jin, Y. X.; Tan, P.; Lee, T. Y. Intrinsic image decomposition with step and drift shading separation. IEEE Transactions on Visualization and Computer Graphics Vol. 26, No. 2, 1332–1346, 2020.
[35]
Fu, X. Y.; Zeng, D. L.; Huang, Y.; Zhang, X. P.; Ding, X. H. A weighted variational model for simultaneous reflectance and illumination estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2782–2790, 2016.
DOI
[36]
Fu, G.; Zhang, Q.; Xiao, C. X. Towards high-quality intrinsic images in the wild. In: Proceedings of the IEEE International Conference on Multimedia and Expo, 175–180, 2019.
DOI
[37]
Krebs, A.; Benezeth, Y.; Marzani, F. Intrinsic image decomposition as two independent deconvolution problems. Signal Processing: Image Communication Vol. 86, 115872, 2020.
[38]
Tang, Y.; Salakhutdinov, R.; Hinton, G. Deep Lambertian networks. In: Proceedings of the 29th International Conference on Machine Learning, 1419–1426, 2012.
[39]
Zhou, T. H.; Krähenbühl, P.; Efros, A. A. Learning data-driven reflectance priors for intrinsic image decomposition. In: Proceedings of the IEEE International Conference on Computer Vision, 3469–3477, 2015.
DOI
[40]
Zoran, D.; Isola, P.; Krishnan, D.; Freeman, W. T. Learning ordinal relationships for mid-level vision. In: Proceedings of the IEEE International Conference on Computer Vision, 388–396, 2015.
DOI
[41]
Nestmeyer, T.; Gehler, P. V. Reflectance adaptive filtering improves intrinsic image estimation. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1771–1780, 2017.
DOI
[42]
Fu, G.; Zhang, Q.; Zhu, L.; Li, P.; Xiao, C. X. A multi-task network for joint specular highlight detection and removal. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7748–7757, 2021.
DOI
[43]
Seo, K.; Kinoshita, Y.; Kiya, H. Deep retinex network for estimating illumination colors with self-supervised learning. In: Proceedings of the IEEE 3rd Global Conference on Life Sciences and Technologie, 1–5, 2021.
DOI
[44]
Baslamisli, A. S.; Liu, Y.; Karaoglu, S.; Gevers, T. Physics-based shading reconstruction for intrinsic image decomposition. Computer Vision and Image Understanding Vol. 205, 103183, 2021.
[45]
Baslamisli, A. S.; Das, P.; Le, H. A.; Karaoglu, S.; Gevers, T. ShadingNet: Image intrinsics by fine-grained shading decomposition. International Journal of Computer Vision Vol. 129, No. 8, 2445–2473, 2021.
[46]
Zhu, Y. J.; Tang, J. J.; Li, S.; Shi, B. X. DeRenderNet: Intrinsic image decomposition of urban scenes with shape-(In)dependent shading rendering. In: Proceedings of the IEEE International Conference on Computational Photography, 1–11, 2021.
DOI
[47]
Sial, H. A.; Baldrich, R.; Vanrell, M. Deep intrinsic decomposition trained on surreal scenes yet with realistic light effects. Journal of the Optical Society of America A Vol. 37, No. 1, 1–15, 2019.
[48]
Chang, A. X.; Funkhouser, T.; Guibas, L.; Hanrahan, P.; Huang, Q. X.; Li, Z. M.; Savarese, S.; Savva, M.; Song, S.; Su, H.; et al. ShapeNet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012, 2015.
[49]
Kong, N.; Gehler, P. V.; Black, M. J. Intrinsic video. In: Computer Vision – ECCV 2014. Lecture Notes in Computer Science, Vol. 8690. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 360–375, 2014.
DOI
[50]
Ye, G. Z.; Garces, E.; Liu, Y. B.; Dai, Q. H.; Gutierrez, D. Intrinsic video and applications. ACM Transactions on Graphics Vol. 33, No. 4, Article No. 80, 2014.
[51]
Meka, A.; Zollhöfer, M.; Richardt, C.; Theobalt, C. Live intrinsic video. ACM Transactions on Graphics Vol. 35, No. 4, Article No. 109, 2016.
[52]
Lei, C.; Xing, Y.; Chen, Q. Blind video temporal consistency via deep video prior. In: Proceedings of the 34th Conference on Neural Information Processing Systems, 1083–1093, 2020.
[53]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[54]
Deng, J.; Dong, W.; Socher, R.; Li, L. J.; Kai, L.; Li, F. F. ImageNet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 248–255, 2009.
DOI
[55]
Johnson, J.; Alahi, A.; Li, F. F. Perceptual losses for real-time style transfer and super-resolution. In: Computer Vision – ECCV 2016. Lecture Notes in Computer Science, Vol. 9906. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 694–711, 2016.
DOI
[56]
Wang, Z.; Bovik, A. C.; Sheikh, H. R.; Simoncelli, E. P. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing Vol. 13, No. 4, 600–612, 2004.
[57]
Roweis, S. T.; Saul, L. K. Nonlinear dimensionality reduction by locally linear embedding. Science Vol. 290, No. 5500, 2323–2326, 2000.
[58]
Yao, C. H.; Chang, C. Y.; Chien, S. Y. Occlusion-aware video temporal consistency. In: Proceedings of the 25th ACM International Conference on Multimedia, 777–785, 2017.
DOI
[59]
Garces, E.; Munoz, A.; Lopez-Moreno, J.; Gutierrez, D. Intrinsic images by clustering. Computer Graphics Forum Vol. 31, No. 4, 1415–1424, 2012.
[60]
Barron, J. T.; Adams, A.; Shih, Y.; Hernández, C. Fast bilateral-space stereo for synthetic defocus. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4466–4474, 2015.
DOI
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 11 January 2022
Accepted: 09 May 2022
Published: 18 April 2023
Issue date: September 2023

Copyright

© The Author(s) 2023.

Acknowledgements

Portions of this work were presented at the International Conference on Computer Vision Workshops in 2019 [18]. This work was supported by the Special Funds for Creative Research (Grant No. 2022C61540) and the National Natural Science Foundation of China (NSFC, Grant Nos. 61972012 and 61732016).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Return