Sequential interactive image segmentation

Zheng Lin; Zhao Zhang; Zi-Yue Zhu; Deng-Ping Fan; Xia-Lei Liu

doi:10.1007/s41095-022-0302-8

Computational Visual Media 2023, 9(4): 753-765 https://doi.org/10.1007/s41095-022-0302-8

Research Article |

Open Access | Issue | Published: 05 July 2023

Sequential interactive image segmentation

Show Author's Information Hide Author's Information Zheng Lin^¹, Zhao Zhang^¹, Zi-Yue Zhu^¹, Deng-Ping Fan^²(

), Xia-Lei Liu^¹

1TKLNDST, College of Computer Science, Nankai University, Tianjin, China

2Computer Vision Lab, ETH Zurich, Zurich, Switzerland

Keywords:

user interaction, object segmentation, interactive segmentation

Cite this article:

Lin Z, Zhang Z, Zhu Z-Y, et al. Sequential interactive image segmentation. Computational Visual Media, 2023, 9(4): 753-765. https://doi.org/10.1007/s41095-022-0302-8

Download citation

EndNote(RIS)

BibTeX

289

Views

Downloads

Citations

Crossref

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

Interactive image segmentation (IIS) is an important technique for obtaining pixel-level anno-tations. In many cases, target objects share similar semantics. However, IIS methods neglect this con-nection and in particular the cues provided by representations of previously segmented objects, previous user interaction, and previous prediction masks, which can all provide suitable priors for the current annotation. In this paper, we formulate a sequential interactive image segmentation (SIIS) task for minimizing user interaction when segmenting sequences of related images, and we provide a practical approach to this task using two pertinent designs. The first is a novel interaction mode. When annotating a new sample, our method can automatically propose an initial click proposal based on previous annotation. This dramatically helps to reduce the interaction burden on the user. The second is an online opti-mization strategy, with the goal of providing seman-tic information when annotating specific targets, optimizing the model with dense supervision from previously labeled samples. Experiments demonstrate the effectiveness of regarding SIIS as a particular task, and our methods for addressing it.

Full text

Abstract

Full text

Outline

About this article

Sequential interactive image segmentation

Show Author's information Hide Author's Information Zheng Lin^¹, Zhao Zhang^¹, Zi-Yue Zhu^¹, Deng-Ping Fan^²(

), Xia-Lei Liu^¹

1TKLNDST, College of Computer Science, Nankai University, Tianjin, China

2Computer Vision Lab, ETH Zurich, Zurich, Switzerland

Abstract

Keywords: user interaction, object segmentation, interactive segmentation

References(54)

[1]

Maninis, K. K.; Caelles, S.; Pont-Tuset, J.; Van Gool, L. Deep extreme cut: From extreme points to object segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 616–625, 2018.

DOI

[2]

Le, H.; Mai, L.; Price, B.; Cohen, S.; Jin, H. L.; Liu, F. Interactive boundary prediction for object selection. In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science, Vol. 11218. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 20–36, 2018.

DOI

[3]

Jain, S. D.; Grauman, K. Click carving: Interactive object segmentation in images and videos with point clicks. International Journal of Computer Vision Vol. 127, No. 9, 1321–1344, 2019.

DOI Google Scholar

[4]

Xu, N.; Price, B.; Cohen, S.; Yang, J. M.; Huang, T. Deep GrabCut for object selection. In: Proceedings of the British Machine Vision Conference, 182.1–182.12, 2017.

DOI

[5]

Majumder, S.; Rai, A.; Khurana, A.; Yao, A. Two-in-one refinement for interactive segmentation. In: Proceedings of the 31st British Machine Vision Conference, 2020.

[6]

Zhang, S. Y.; Liew, J. H.; Wei, Y. C.; Wei, S. K.; Zhao, Y. Interactive object segmentation with inside-outside guidance. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12231–12241, 2020.

DOI

[7]

Li, Z. W.; Chen, Q. F.; Koltun, V. Interactive image segmentation with latent diversity. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 577–585, 2018.

DOI

[8]

Liew, J. H.; Cohen, S.; Price, B.; Mai, L.; Ong, S. H.; Feng, J. S. MultiSeg: Semantically meaningful, scale-diverse segmentations from minimal user input. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 662–670, 2019.

DOI

[9]

Mahadevan, S.; Voigtlaender, P.; Leibe, B. Iteratively trained interactive segmentation. arXiv preprint arXiv:1805.04398, 2018.

Google Scholar

[10]

Majumder, S.; Yao, A. Content-aware multi-level guidance for interactive instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11594–11603, 2019.

DOI

[11]

Lin, Z.; Zhang, Z.; Chen, L. Z.; Cheng, M. M.; Lu, S. P. Interactive image segmentation with first click attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13336–13345, 2020.

DOI

[12]

Jang, W. D.; Kim, C. S. Interactive image seg-mentation via backpropagating refinement scheme. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5292–5301, 2019.

DOI

[13]

Sofiiuk, K.; Petrov, I.; Barinova, O.; Konushin, A. F-BRS: Rethinking backpropagating refinement for interactive segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8620–8629, 2020.

DOI

[14]

Kontogianni, T.; Gygli, M.; Uijlings, J.; Ferrari, V. Continuous adaptation for interactive object segmentation by learning from corrections. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12361. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 579–596, 2020.

DOI

[15]

Gong, L. X.; Zhang, Y. Q.; Zhang, Y. K.; Yang, Y.; Xu, W. W. Erroneous pixel prediction for semantic image segmentation. Computational Visual Media Vol. 8, No. 1, 165–175, 2022.

DOI Google Scholar

[16]

Zhang, X. Y.; Wang, L. J.; Xie, J.; Zhu, P. F. Human-in-the-loop image segmentation and annotation. Science China Information Sciences Vol. 63, No. 11, 219101, 2020.

DOI Google Scholar

[17]

Vezhnevets, V.; Konouchine, V. “GrowCut” - Interactive multi-label N-D image segmentation by cellular automata. Proc. of Graph. Vol. 1, No. 4, 150–156, 2005.

Google Scholar

[18]

Bai, X.; Sapiro, G. Geodesic matting: A framework for fast interactive image and video segmentation and matting. International Journal of Computer Vision Vol. 82, No. 2, 113–132, 2009.

DOI Google Scholar

[19]

Gulshan, V.; Rother, C.; Criminisi, A.; Blake, A.; Zisserman, A. Geodesic star convexity for interactive image segmentation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 3129–3136, 2010.

DOI

[20]

Kim, T. H.; Lee, K. M.; Lee, S. U. Nonparametric higher-order learning for interactive segmentation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 3201–3208, 2010.

DOI

[21]

Jian, M.; Jung, C. Interactive image segmentation using adaptive constraint propagation. IEEE Transactions on Image Processing Vol. 25, No. 3, 1301–1311, 2016.

DOI Google Scholar

[22]

Wang, T.; Yang, J.; Ji, Z. X.; Sun, Q. S. Probabilistic diffusion for interactive image segmentation. IEEE Transactions on Image Processing Vol. 28, No. 1, 330–342, 2019.

DOI Google Scholar

[23]

Wu, J. J.; Zhao, Y. B.; Zhu, J. Y.; Luo, S. W.; Tu, Z. W. MILCut: A sweeping line multiple instance learning paradigm for interactive image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 256–263, 2014.

DOI

[24]

Bai, J. J.; Wu, X. D. Error-tolerant scribbles based interactive image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 392–399, 2014.

DOI

[25]

Rother, C.; Kolmogorov, V.; Blake, A. “GrabCut”: Interactive foreground extraction using iterated graph cuts. In: Proceedings of the ACM SIGGRAPH 2004 Papers, 309–314, 2004.

DOI

[26]

Mortensen, E. N.; Barrett, W. A. Intelligent scissors for image composition. In: Proceedings of the 22nd Annual Conference on Computer Graphics and Interactive Techniques, 191–198, 1995.

DOI

[27]

Li, Y.; Sun, J. A.; Tang, C. K.; Shum, H. Y. Lazy snapping. ACM Transactions on Graphics Vol. 23, No. 3, 303–308, 2004.

DOI Google Scholar

[28]

Xu, N.; Price, B.; Cohen, S.; Yang, J. M.; Huang, T.Deep interactive object selection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 373–381, 2016.

DOI

[29]

Boykov, Y. Y.; Jolly, M.-P. Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images. In: Proceedings of the 8th IEEE International Conference on Computer Vision, 105–112, 2001.

[30]

Boykov, Y.; Kolmogorov, V. An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 26, No. 9, 1124–1137, 2004.

DOI Google Scholar

[31]

Grady, L. Random walks for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 28, No. 11, 1768–1783, 2006.

DOI Google Scholar

[32]

Kim, T. H.; Lee, K. M.; Lee, S. U. Generative image segmentation using random walks with restart. In: Computer Vision – ECCV 2008. Lecture Notes in Computer Science, Vol. 5304. Forsyth, D.; Torr, P.; Zisserman, A. Eds. Springer Berlin Heidelberg, 264–275, 2008.

DOI

[33]

Castrejón, L.; Kundu, K.; Urtasun, R.; Fidler, S. Annotating object instances with a polygon-RNN. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4485–4493, 2017.

DOI

[34]

Acuna, D.; Ling, H.; Kar, A.; Fidler, S. Effi-cient interactive annotation of segmentation datasets with polygon-RNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 859–868, 2018.

DOI

[35]

Ling, H.; Gao, J.; Kar, A.; Chen, W. Z.; Fidler, S. Fast interactive object annotation with curve-GCN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5252–5261, 2019.

DOI

[36]

Lee, K. M.; Myeong, H.; Song, G. SeedNet: Automatic seed generation with deep reinforcement learning for robust interactive segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1760–1768, 2018.

DOI

[37]

Liew, J.; Wei, Y. C.; Xiong, W.; Ong, S. H.; Feng, J. S. Regional interactive image segmentation networks. In: Proceedings of the IEEE International Conference on Computer Vision, 746–2754, 2017.

DOI

[38]

Hu, Y.; Soltoggio, A.; Lock, R.; Carter, S. A fully convolutional two-stream fusion network for interactive image segmentation. Neural Networks Vol. 109, 31–42, 2019.

DOI Google Scholar

[39]

Benenson, R.; Popov, S.; Ferrari, V. Large-scale interactive object segmentation with human annotators. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11692–11701, 2019.

DOI

[40]

Lin, Z.; Duan, Z. P.; Zhang, Z.; Guo, C. L.; Cheng, M. M. FocusCut: Diving into a focus view in interactive segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2627–2636, 2022.

DOI

[41]

Zhang, C. B.; Xiao, J. W.; Liu, X. L.; Chen, Y. C.; Cheng, M. M. Representation compensation networks for continual semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7043–7054, 2022.

DOI

[42]

Cermelli, F.; Mancini, M.; Rota Bulò, S.; Ricci, E.; Caputo, B. Modeling the background for incremental learning in semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9230–9239, 2020.

DOI

[43]

Chen, L. C.; Zhu, Y. K.; Papandreou, G.; Schroff, F.; Adam, H. Encoder–decoder with atrous separable convolution for semantic image segmentation. In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science, Vol. 11211. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 833–851, 2018.

DOI

[44]

He, K. M.; Zhang, X. Y.; Ren, S. Q.; Sun, J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778, 2016.

DOI

[45]

Everingham, M.; Gool, L.; Williams, C. K. I.; Winn, J.; Zisserman, A. The pascal visual object classes (VOC) challenge. International Journal of Computer Vision Vol. 88, No. 2, 303–338, 2010.

DOI Google Scholar

[46]

Hariharan, B.; Arbeláez, P.; Bourdev, L.; Maji, S.; Malik, J. Semantic contours from inverse detectors. In: Proceedings of the International Conference on Computer Vision, 991–998, 2011.

DOI

[47]

Lin, T. Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C. L. Microsoft COCO: Common objects in context. In: Computer Vision – ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 740–755, 2014.

DOI

[48]

Fan, D. P.; Lin, Z.; Ji, G. P.; Zhang, D. W.; Fu, H. Z.; Cheng, M. M. Taking a deeper look at co-salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2916–2926, 2020.

DOI

[49]

Fan, D. P.; Li, T. P.; Lin, Z.; Ji, G. P.; Zhang, D. W.; Cheng, M. M.; Fu, H. Z.; Shen, J. B. Re-thinking co-salient object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 44, No. 8, 4339–4354, 2022.

Google Scholar

[50]

Zhang, Z.; Jin, W. D.; Xu, J.; Cheng, M. M. Gradient-induced co-saliency detection. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12357. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 455–472, 2020.

DOI

[51]

Jia, M. L.; Shi, M. Y.; Sirotenko, M.; Cui, Y.; Cardie, C.; Hariharan, B.; Adam, H.; Belongie, S. Fashionpedia: Ontology, segmentation, and an attribute localization dataset. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12346. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 316–332, 2020.

DOI

[52]

Wang, J.; Markert, K.; Everingham, M. Learning models for object recognition from natural language descriptions. In: Proceedings of the British Machine Vision Conference, 2.1–2.11, 2009.

DOI

[53]

Deng, J.; Dong, W.; Socher, R.; Li, L. J.; Kai, L.; Li, F. F. ImageNet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 248–255, 2009.

DOI

[54]

Steiner, B.; DeVito, Z.; Chintala, S.; Gross, S.; Paszke, A.; Massa, F.; Lerer, A.; Chanan, G.; Lin, Z.; Yang, E.; et al. PyTorch: An imperative style, high-performance deep learning library. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, Article No. 721, 8026–8037, 2019.

About this article

Publication history

Rights and permissions

Publication history

Received: 11 February 2022

Accepted: 17 June 2022

Published: 05 July 2023

Issue date: December 2023

Copyright

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.