Global video object segmentation with spatial constraint module

Yadang Chen; Duolin Wang; Zhiguo Chen; Zhi-Xin Yang; Enhua Wu

doi:10.1007/s41095-022-0282-8

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (7.4 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Research Article | Open Access

Global video object segmentation with spatial constraint module

Yadang Chen^¹, Duolin Wang^¹(

), Zhiguo Chen^¹, Zhi-Xin Yang^², Enhua Wu^{³^,⁴}

1 Engineering Research Center of Digital Forensics, Ministry of Education, School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing 210044, China

2 State Key Laboratory of Internet of Things for Smart City, Department of Electromechanical Engineering, University of Macau, Macau 999078, China

3 State Key Laboratory of Computer Science, Institute of Software, University of Chinese Academy of Sciences, Beijing 100190, China

4 Faculty of Science and Technology, University of Macau, Macau 999078, China

Show Author Information

Graphical Abstract

Abstract

We present a lightweight and efficient semi-supervised video object segmentation network based on the space-time memory framework. To some extent, our method solves the two difficulties encountered in traditional video object segmentation: one is that the single frame calculation time is too long, and the other is that the current frame’s segmentation should use more information from past frames. The algorithm uses a global context (GC) module to achieve high-performance, real-time segmentation. The GC module can effectively integrate multi-frame image information without increased memory and can process each frame in real time. Moreover, the prediction mask of the previous frame is helpful for the segmentation of the current frame, so we input it into a spatial constraint module (SCM), which constrains the areas of segments in the current frame. The SCM effectively alleviates mismatching of similar targets yet consumes few additional resources. We added a refinement module to the decoder to improve boundary segmentation. Our model achieves state-of-the-art results on various datasets, scoring $80.1 %$ on YouTube-VOS 2018 and a $𝒥 & ℱ$ score of $78.0 %$ on DAVIS 2017, while taking 0.05 s per frame on the DAVIS 2016 validation dataset.

Keywords

video object segmentation semantic segmen-tation global context (GC) module spatial constraint

References

[1]

Chen, D.; Tang, F.; Dong, W. M.; Yao, H. X.; Xu, C. S. SiamCPN: Visual tracking with the Siamese center-prediction network. Computational Visual Media Vol. 7, No. 2, 253–265, 2021.

Crossref Google Scholar

[2]

Li, X.; Liu, S.; De Mello, S.; Wang, X.; Kautz, J.; Yang, M. H. Joint-task self-supervised learning for temporal correspondence. arXiv preprint arXiv:1909.11895, 2019.

Google Scholar

[3]

Zhang, F. L.; Barnes, C.; Zhang, H. T.; Zhao, J. H.; Salas, G. Coherent video generation for multiple hand-held cameras with dynamic foreground. Computational Visual Media Vol. 6, No. 3, 291–306, 2020.

Crossref Google Scholar

[4]

Cheng, J. C.; Tsai, Y. H.; Hung, W. C.; Wang, S. J.; Yang, M. H. Fast and accurate online video object segmentation via tracking parts. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7415–7424, 2018.

Crossref

[5]

Maninis, K. K.; Caelles, S.; Chen, Y.; Pont-Tuset, J.; Leal-Taixé, L.; Cremers, D.; Van Gool, L. Video object segmentation without temporal information. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 41, No. 6, 1515–1530, 2019.

Crossref Google Scholar

[6]

Voigtlaender, P.; Chai, Y. N.; Schroff, F.; Adam, H.; Leibe, B.; Chen, L. C. FEELVOS: Fast end-to-end embedding learning for video object segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9473–9482, 2019.

Crossref

[7]

Li, Y.; Shen, Z.; Shan, Y. Fast video object segmentation using the global context module. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12355. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 735–750, 2020.

[8]

Hu, Y. T.; Huang, J. B.; Schwing, A. G. MaskRNN: Instance level video object segmentation. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 324–333, 2017.

[9]

Khoreva, A.; Benenson, R.; Ilg, E.; Brox, T.; Schiele, B. Lucid data dreaming for object tracking. In: Proceedings of the 2017 DAVIS Challenge on Video Object Segmentation - CVPR Workshops, 2017.

[10]

Li, X.; Loy, C. C. Video object segmentation with joint re-identification and attention-aware mask propagation. In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science, Vol. 11207. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 93–110, 2018.

[11]

Perazzi, F.; Khoreva, A.; Benenson, R.; Schiele, B.; Sorkine-Hornung, A. Learning video object segmentation from static images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3491–3500, 2017.

Crossref

[12]

Caelles, S.; Maninis, K.-K.; Pont-Tuset, J.; Leal-Taixé, L.; Cremers, D.; Van Gool, L. One-shot video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5320–5329, 2017.

Crossref

[13]

Voigtlaender, P.; Leibe, B. Online adaptation of convolutional neural networks for video object segmentation. arXiv preprint arXiv:1706.09364, 2017.

Crossref Google Scholar

[14]

Yoon, J. S.; Rameau, F.; Kim, J.; Lee, S.; Shin, S.; Kweon, I. S. Pixel-level matching for video object segmentation using convolutional neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2186–2195, 2017.

Crossref

[15]

Wang, Z. Q.; Xu, J.; Liu, L.; Zhu, F.; Shao, L. RANet: Ranking attention network for fast video object segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 3977–3986, 2019.

Crossref

[16]

Oh, S. W.; Lee, J. Y.; Sunkavalli, K.; Kim, S. J. Fast video object segmentation by reference-guided mask propagation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7376–7385, 2018.

[17]

Yang, L.; Wang, Y.; Xiong, X.; Yang, J.; Katsaggelos, A. K. Efficient video object segmentation via network modulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6499–6507, 2018.

Crossref

[18]

Oh, S. W.; Lee, J.-Y.; Xu, N.; Kim, S. J. Video object segmentation using space-time memory networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 9225–9234, 2019.

[19]

Seong, H.; Hyun, J.; Kim, E. Kernelized memory network for video object segmentation. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12367. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 629–645, 2020.

[20]

Zhang, P.; Hu, L.; Zhang, B.; Pan, P. Spatial constrained memory network for semi-supervised video object segmentation. In: Proceedings of the 2020 DAVIS Challenge on Video Object Segmentation - CVPR Workshops, 2020.

[21]

Chen, L. C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017.

Crossref Google Scholar

[22]

Liu, P.; Fu, H. Y.; Ma, H. D. An end-to-end convolutional network for joint detecting and denoising adversarial perturbations in vehicle classification. Computational Visual Media Vol. 7, No. 2, 217–227, 2021.

Crossref Google Scholar

[23]

Huo, Y. C.; Yoon, S. E. A survey on deep learning-based Monte Carlo denoising. Computational Visual Media Vol. 7, No. 2, 169–185, 2021.

Crossref Google Scholar

[24]

Danon, D.; Arar, M.; Cohen-Or, D.; Shamir, A. Image resizing by reconstruction from deep features. Computational Visual Media Vol. 7, No. 4, 453–466, 2021.

Crossref Google Scholar

[25]

Liu, X. T.; Li, C. Z.; Wong, T. T. Boundary-aware texture region segmentation from manga. Computational Visual Media Vol. 3, No. 1, 61–71, 2017.

Crossref Google Scholar

[26]

Chen, Y. H.; Pont-Tuset, J.; Montes, A.; Gool, L. V. Blazingly fast video object segmentation with pixel-wise metric learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1189–1198, 2018.

Crossref

[27]

Khoreva, A.; Benenson, R.; Ilg, E.; Brox, T.; Schiele, B. Lucid data dreaming for video object segmentation. International Journal of Computer Vision Vol. 127, No. 9, 1175–1197, 2019.

Crossref Google Scholar

[28]

Wang, X. L.; Jabri, A.; Efros, A. A. Learning correspondence from the cycle-consistency of time. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2561–2571, 2019.

Crossref

[29]

Zhang, M. L.; Zhou, Z. H. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition Vol. 40, No. 7, 2038–2048, 2007.

Crossref Google Scholar

[30]

Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7794–7803, 2018.

Crossref

[31]

Liang, Y. Q.; Li, X.; Jafari, N.; Chen, Q. Video object segmentation with adaptive feature bank and uncertain-region refinement. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, Article No. 289, 3430–3441, 2020.

[32]

Cheng, H. K.; Tai, Y. W.; Tang, C. K. Rethinking space-time networks with improved memory coverage for efficient video object segmentation. arXiv preprint arXiv: 2106.05210, 2021.

Google Scholar

[33]

Hu, L.; Zhang, P.; Zhang, B.; Pan, P.; Xu, Y.; Jin, R. Learning position and target consistency for memory-based video object segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4142–4152, 2021.

Crossref

[34]

Xie, H.; Yao, H.; Zhou, S.; Zhang, S.; Sun, W. Efficient regional memory network for video object segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1286–1295, 2021.

Crossref

[35]

Tang, L. L.; Chen, K.; Wu, C. Z.; Hong, Y.; Jia, K.; Yang, Z. X. Improving semantic analysis on point clouds via auxiliary supervision of local geometric priors. IEEE Transactions on Cybernetics Vol. 52, No. 6, 4949–4959, 2022.

Crossref Google Scholar

[36]

Yang, Z. X.; Tang, L. L.; Zhang, K.; Wong, P. K. Multi-view CNN feature aggregation with ELM auto-encoder for 3D shape recognition. Cognitive Computation Vol. 10, No. 6, 908–921, 2018.

Crossref Google Scholar

[37]

Perazzi, F.; Pont-Tuset, J.; McWilliams, B.; Van Gool, L.; Gross, M.; Sorkine-Hornung, A. A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 724–732, 2016.

Crossref

[38]

Pont-Tuset, J.; Perazzi, F.; Caelles, S.; Arbeláez, P.; Sorkine-Hornung, A.; Van Gool, L. The 2017 DAVIS Challenge on Video Object Segmentation. arXiv preprint arXiv:1704.00675, 2017.

Google Scholar

[39]

Xu, N.; Yang, L.; Fan, Y.; Yue, D.; Liang, Y.; Yang, J.; Huang, T. YouTube-VOS: A large-scale video object segmentation benchmark. arXiv preprint arXiv:1809.03327, 2018.

Crossref Google Scholar

[40]

Bao, L. C.; Wu, B. Y.; Liu, W. CNN in MRF: Video object segmentation via inference in a CNN-based higher-order spatio-temporal MRF. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5977–5986, 2018.

Crossref

[41]

Luiten, J.; Voigtlaender, P.; Leibe, B. PReMVOS: Proposal-generation, refinement and merging for video object segmentation. arXiv preprint arXiv:1807.09190, 2018.

Google Scholar

[42]

Li, Y.; Wen, L.; Chang, M. C.; Lyu, S. Graph-to-graph energy minimization for video object segmentation. In: Proceedings of the 16th IEEE International Conference on Advanced Video and Signal Based Surveillance, 1–8, 2019.

Crossref

[43]

Wang, Q.; Zhang, L.; Bertinetto, L.; Hu, W. M.; Torr, P. H. S. Fast online object tracking and segmentation: A unifying approach. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1328–1338, 2019.

Crossref

[44]

Hu, Y. T.; Huang, J. B.; Schwing, A. G. VideoMatch: Matching based video object segmentation. In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science, Vol. 11212. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 56–73, 2018.

[45]

Johnander, J.; Danelljan, M.; Brissman, E.; Khan, F. S.; Felsberg, M. A generative appearance model for end-to-end video object segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8945–8954, 2019.

Crossref

[46]

Lin, T. Y.; Maire, M.; Belongie, S.; Bourdev, L.; Girshick, R.; Hays, J.; Perona, P.; Ramanan, D.; Zitnick, C. L.; Dollár, P. Microsoft COCO: Common objects in context. In: Computer Vision – ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 740–755, 2014.

Crossref

[47]

Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

Google Scholar

[48]

Ventura, C.; Bellver, M.; Girbau, A.; Salvador, A.; Marques, F.; Giro-i-Nieto, X. RVOS: End-to-end recurrent network for video object segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5272–5281, 2019.

Crossref

[49]

Xu, N.; Yang, L.; Fan, Y.; Yang, J.; Yue, D.; Liang, Y.; Price, B.; Cohen, S.; Huang, T. YouTube-VOS: Sequence-to-sequence video object segmentation. In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science, Vol. 11209. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 603–619, 2018.

Crossref

[50]

Wehrwein, S.; Szeliski, R. Video segmentation with background motion models. In: Proceedings of the British Machine Vision Conference, 96.1–96.12, 2017.

Crossref

[51]

Voigtlaender, P.; Luiten, J.; Leibe, B. BoLTVOS: Box-level tracking for video object segmentation. arXiv preprint arXiv:1904.04552, 2019.

Google Scholar

[52]

Lin, H. J.; Qi, X. J.; Jia, J. Y. AGSS-VOS: Attention guided single-shot video object segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 3948–3956, 2019.

Crossref

Computational Visual Media

Volume 9 Issue 2,
June 2023

Pages 385-400

DOI: 10.1007/s41095-022-0282-8

Cite this article:

Chen Y, Wang D, Chen Z, et al. Global video object segmentation with spatial constraint module. Computational Visual Media, 2023, 9(2): 385-400. https://doi.org/10.1007/s41095-022-0282-8

6999

Views

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 31 December 2021

Accepted: 05 March 2022

Published: 03 January 2023

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.