Journal Home > Volume 8 , Issue 4

Depth information can benefit various computer vision tasks on both images and videos. However, depth maps may suffer from invalid values in many pixels, and also large holes. To improve such data, we propose a joint self-supervised and reference-guided learning approach for depth inpainting. For the self-supervised learning strategy, we introduce an improved spatial convolutional sparse coding module in which total variation regularization is employed to enhance the structural information while preserving edge information. This module alternately learns a convolutional dictionary and sparse coding from a corrupted depth map. Then, both the learned convolutional dictionary and sparse coding are convolved to yield an initial depth map, which is effectively smoothed using local contextual information. The reference-guided learning part is inspired by the fact that adjacent pixels with close colors in the RGB image tend to have similar depth values. We thus construct a hierarchical joint bilateral filter module using the corresponding color image to fill in large holes. In summary, our approach integrates a convolutional sparse coding module to preserve local contextual information and a hierarchical joint bilateral filter module for filling using specific adjacent information. Experimental results show that the proposed approach works well for both invalid value restoration and large hole inpainting.


menu
Abstract
Full text
Outline
About this article

Joint self-supervised and reference-guided learning for depth inpainting

Show Author's information Heng Wu1,*Kui Fu1,*Yifan Zhao1Haokun Song1Jia Li1( )
State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering,Beihang University, Beijing 100191, China

*Heng Wu and Kui Fu contributed equally to this work.

Abstract

Depth information can benefit various computer vision tasks on both images and videos. However, depth maps may suffer from invalid values in many pixels, and also large holes. To improve such data, we propose a joint self-supervised and reference-guided learning approach for depth inpainting. For the self-supervised learning strategy, we introduce an improved spatial convolutional sparse coding module in which total variation regularization is employed to enhance the structural information while preserving edge information. This module alternately learns a convolutional dictionary and sparse coding from a corrupted depth map. Then, both the learned convolutional dictionary and sparse coding are convolved to yield an initial depth map, which is effectively smoothed using local contextual information. The reference-guided learning part is inspired by the fact that adjacent pixels with close colors in the RGB image tend to have similar depth values. We thus construct a hierarchical joint bilateral filter module using the corresponding color image to fill in large holes. In summary, our approach integrates a convolutional sparse coding module to preserve local contextual information and a hierarchical joint bilateral filter module for filling using specific adjacent information. Experimental results show that the proposed approach works well for both invalid value restoration and large hole inpainting.

Keywords: depth inpainting, self-supervised learning, reference-guided learning

References(44)

[1]
Song, S. R.; Yu, F.; Zeng, A.; Chang, A. X.; Savva, M.; Funkhouser, T. Semantic scene completion from a single depth image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 190-198, 2017.
DOI
[2]
Wang, X.; Ong, S. K.; Nee, A. Y. C. A comprehensive survey of augmented reality assembly research. Advances in Manufacturing Vol. 4, No. 1, 1-22, 2016.
[3]
Neven, D.; De Brabandere, B.; Georgoulis, S.; Proesmans, M.; Van Gool, L. Fast scene understanding for autonomous driving. arXiv preprint arXiv:1708. 02550, 2017.
[4]
Tölgyessy, M.; Hubinsky, P. The Kinect sensor in robotics education. In: Proceedings of the 2nd International Conference on Robotics in Education, 143-146, 2011.
[5]
Zhang, Y. D.; Funkhouser, T. Deep depth completion of a single RGB-D image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 175-185, 2018.
DOI
[6]
Imran, S.; Long, Y.; Liu, X.; Morris, D. Depth coefficients for depth completion. arXiv preprint arXiv:1903.05421, 2019.
[7]
Ma, F. C.; Karaman, S. Sparse-to-dense: Depth prediction from sparse depth samples and a single image. In: Proceedings of the IEEE International Conference on Robotics and Automation, 4796-4803, 2018.
[8]
Liao, M.; Lu, F.; Zhou, D.; Zhang, S.; Li, W.; Yang, R. DVI: Depth guided video inpainting for autonomous driving. In: Computer Vision - ECCV 2020. Lecture Notes in Computer Science, Vol. 12366. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 1-17, 2020.
[9]
Mori, S.; Erat, O.; Broll, W.; Saito, H.; Schmalstieg, D.; Kalkofen, D. InpaintFusion: Incremental RGB-D inpainting for 3D scenes. IEEE Transactions on Visualization and Computer Graphics Vol. 26, No. 10, 2994-3007, 2020.
[10]
Zhang, C.; Wang, T. Image inpainting using double discriminator generative adversarial networks. Journal of Physics: Conference Series Vol. 1732, No. 1, 012052, 2021.
[11]
Kim, J.; Hyeon, J.; Doh, N. Generative multiview inpainting for object removal in large indoor spaces. International Journal of Advanced Robotic Systems Vol. 18, No. 2, 172988142199654, 2021.
[12]
Herrera C., D.; Kannala, J.; Ladický, L.; Heikkilä, J. Depth map inpainting under a second-order smoothness prior. In: Image Analysis. Lecture Notes in Computer Science, Vol. 7944. Kämäräinen, J. K.; Koskela, M. Eds. Springer Berlin Heidelberg, 555-566, 2013.
DOI
[13]
Gong, X. J.; Liu, J. Y.; Zhou, W. H.; Liu, J. L. Guided depth enhancement via a fast marching method. Image and Vision Computing Vol. 31, No. 10, 695-703, 2013.
[14]
Liu, J.; Gong, X.; Liu, J. Guided inpainting and filtering for Kinect depth maps. In: Proceedings of the 21st International Conference on Pattern Recognition, 2055-2058, 2012.
[15]
Liu, J.; Gong, X. Guided depth enhancement via anisotropic diffusion. In: Advances in Multimedia Information Processing - PCM 2013. Lecture Notes in Computer Science, Vol. 8294. Huet, B.; Ngo, C. W.; Tang, J.; Zhou, Z. H.; Hauptmann, A. G.; Yan, S. Eds. Springer Cham, 408-417, 2013.
[16]
Barron, J. T.; Poole, B. The fast bilateral solver. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9907. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 617-632, 2016.
DOI
[17]
Ferstl, D.; Reinbacher, C.; Ranftl, R.; Ruether, M.; Bischof, H. Image guided depth upsampling using anisotropic total generalized variation. In: Proceedings of the IEEE International Conference on Computer Vision, 993-1000, 2013.
DOI
[18]
Xue, H. Y.; Zhang, S. M.; Cai, D. Depth image inpainting: Improving low rank matrix completion with low gradient regularization. IEEE Transactions on Image Processing Vol. 26, No. 9, 4311-4320, 2017.
[19]
Keaomanee, Y.; Heednacram, A.; Youngkong, P. Implementation of four kriging models for depth inpainting. ICT Express Vol. 6, No. 3, 209-213, 2020.
[20]
Heide, F.; Heidrich, W.; Wetzstein, G. Fast and flexible convolutional sparse coding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5135-5143, 2015.
DOI
[21]
Hornácek, M.; Rhemann, C.; Gelautz, M.; Rother, C. Depth super resolution by rigid body self-similarity in 3D. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1123-1130, 2013.
DOI
[22]
Chen, L.; Lin, H.; Li, S. Depth image enhancement for Kinect using region growing and bilateral filter. In: Proceedings of the 21st International Conference on Pattern Recognition, 3070-3073, 2012.
[23]
Qi, F.; Han, J. Y.; Wang, P. J.; Shi, G. M.; Li, F. Structure guided fusion for depth map inpainting. Pattern Recognition Letters Vol. 34, No. 1, 70-76, 2013.
[24]
Matyunin, S.; Vatolin, D.; Berdnikov, Y.; Smirnov, M. Temporal filtering for depth maps generated by Kinect depth camera. In: Proceedings of the 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video, 1-4, 2011.
DOI
[25]
Hawe, S.; Kleinsteuber, M.; Diepold, K. Dense disparity maps from sparse disparity measurements. In: Proceedings of the International Conference on Computer Vision, 2126-2133, 2011.
DOI
[26]
Liu, L.; Chan, S. H.; Nguyen, T. Q. Depth reconstruction from sparse samples: Representation, algorithm, and sampling. IEEE Transactions on Image Processing Vol. 24, No. 6, 1983-1996, 2015.
[27]
Uhrig, J.; Schneider, N.; Schneider, L.; Franke, U.; Brox, T.; Geiger, A. Sparsity invariant CNNs. In: Proceedings of the International Conference on 3D Vision, 11-20, 2017.
DOI
[28]
Ma, F. C.; Cavalheiro, G. V.; Karaman, S. Self-supervised sparse-to-dense: Self-supervised depth completion from LiDAR and monocular camera. In: Proceedings of the International Conference on Robotics and Automation, 3288-3295, 2019.
[29]
Jaritz, M.; de Charette, R.; Wirbel, E.; Perrotton, X.; Nashashibi, F. Sparse and dense data with CNNs: Depth completion and semantic segmentation. In: Proceedings of the International Conference on 3D Vision, 52-60, 2018.
DOI
[30]
Cheng, X.; Wang, P.; Yang, R. Depth estimation via affinity learned with convolutional spatial propagation network. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11220. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 108-125, 2018.
[31]
Ku, J.; Harakeh, A.; Waslander, S. L. In defense of classical image processing: Fast depth completion on the CPU. In: Proceedings of the 15th Conference on Computer and Robot Vision, 16-22, 2018.
DOI
[32]
Zeiler, M. D.; Krishnan, D.; Taylor, G. W.; Fergus, R. Deconvolutional networks. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2528-2535, 2010.
DOI
[33]
Bristow, H.; Eriksson, A.; Lucey, S. Fast convolutional sparse coding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 391-398, 2013.
DOI
[34]
Bristow, H.; Lucey, S. Optimization methods for convolutional sparse coding. arXiv preprint arXiv: 1406.2407, 2014.
[35]
Zhang, H.; Patel, V. M. Convolutional sparse and low-rank coding-based rain streak removal. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 1259-1267, 2017.
DOI
[36]
Yang, L. L.; Li, C.; Han, J. G.; Chen, C.; Ye, Q. X.; Zhang, B. C.; Cao, X.; Liu, W. Image reconstruction via manifold constrained convolutional sparse coding for image sets. IEEE Journal of Selected Topics in Signal Processing Vol. 11, No. 7, 1072-1081, 2017.
[37]
Gu, S. H.; Zuo, W. M.; Xie, Q.; Meng, D. Y.; Feng, X. C.; Zhang, L. Convolutional sparse coding for image super-resolution. In: Proceedings of the IEEE International Conference on Computer Vision, 1823-1831, 2015.
[38]
Affara, L.; Ghanem, B.; Wonka, P. Supervised con-volutional sparse coding. arXiv preprint arXiv: 1804.02678, 2018.
[39]
Papyan, V.; Romano, Y.; Elad, M.; Sulam, J. Convolutional dictionary learning via local processing. In: Proceedings of the IEEE International Conference on Computer Vision, 5306-5314, 2017.
DOI
[40]
Zisselman, E.; Sulam, J.; Elad, M. A local block coordinate descent algorithm for the CSC model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8200-8209, 2019.
DOI
[41]
Zhang, H.; Patel, V. Convolutional sparse coding-based image decomposition. In: Proceedings of the British Machine Vision Conference, 125.1-125.11, 2016.
DOI
[42]
Steidl, G.; Weickert, J.; Brox, T.; Mrázek, P.; Welk, M. On the equivalence of soft wavelet shrinkage, total variation diffusion, total variation regularization, and SIDEs. SIAM Journal on Numerical Analysis Vol. 42, No. 2, 686-713, 2004.
[43]
Silberman, N.; Hoiem, D.; Kohli, P.; Fergus, R. Indoor segmentation and support inference from RGBD images. In: Computer Vision - ECCV 2012. Lecture Notes in Computer Science, Vol. 7576. Fitzgibbon, A.; Lazebnik, S.; Perona, P.; Sato, Y.; Schmid, C. Eds. Springer Berlin Heidelberg, 746-760, 2012.
DOI
[44]
He, K. M.; Sun, J.; Tang, X. O. Guided image filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 35, No. 6, 1397-1409, 2013.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 21 June 2021
Accepted: 01 October 2021
Published: 25 May 2022
Issue date: December 2022

Copyright

© The Author(s) 2022.

Acknowledgements

The authors would like to thank Z.-X. Ma for his helpful participation in experiments. This work was partially supported by a grant from the National Natural Science Foundation of China (No. 61922006).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www. editorialmanager.com/cvmj.

Return