S4Net: Single stage salient-instance segmentation

Ruochen Fan; Ming-Ming Cheng; Qibin Hou; Tai-Jiang Mu; Jingdong Wang; Shi-Min Hu

doi:10.1007/s41095-020-0173-9

Computational Visual Media 2020, 6(2): 191-204 https://doi.org/10.1007/s41095-020-0173-9

Research Article |

Open Access | Issue | Published: 10 June 2020

S4Net: Single stage salient-instance segmentation

Show Author's Information Hide Author's Information Ruochen Fan^¹, Ming-Ming Cheng^², Qibin Hou^², Tai-Jiang Mu^¹, Jingdong Wang^³, Shi-Min Hu^¹(

)

1 BNRist, Tsinghua University, Beijing 100086, China.

2 Nankai University, Tianjin 300071, China.

3 MSRA, Beijing 100086, China.

Keywords:

salient-instance segmentation, salient object detection, single stage, region-of-interest masking

Cite this article:

Fan R, Cheng M-M, Hou Q, et al. S4Net: Single stage salient-instance segmentation. Computational Visual Media, 2020, 6(2): 191-204. https://doi.org/10.1007/s41095-020-0173-9

Download citation

EndNote(RIS)

BibTeX

708

Views

Downloads

Citations

Crossref

N/A

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

In this paper, we consider salient instance segmentation. As well as producing bounding boxes, our network also outputs high-quality instance-level segments as initial selections to indicate the regions of interest. Taking into account the category-independent property of each target, we design a single stage salient instance segmentation framework, with a novel segmentation branch. Our new branch regards not only local context inside each detection window but also the surrounding context, enabling us to distinguish instances in the same scope even with partial occlusion. Our network is end-to-end trainable and is fast (running at 40 fps for images with resolution $320 \times 320$ ). We evaluate our approach on a publicly available benchmark and show that it outperforms alternative solutions. We also provide a thorough analysis of our design choices to help readers better understand the function of each part of our network. Source code can be found at https://github.com/RuochenFan/S4Net.

Full text

Abstract

Full text

Outline

About this article

S4Net: Single stage salient-instance segmentation

Show Author's information Hide Author's Information Ruochen Fan^¹, Ming-Ming Cheng^², Qibin Hou^², Tai-Jiang Mu^¹, Jingdong Wang^³, Shi-Min Hu^¹(

)

1 BNRist, Tsinghua University, Beijing 100086, China.

2 Nankai University, Tianjin 300071, China.

3 MSRA, Beijing 100086, China.

Abstract

Keywords: salient-instance segmentation, salient object detection, single stage, region-of-interest masking

References(70)

[1]

F. F. Li,; R. VanRullen,; C. Koch,; P. Perona, Rapid natural scene categorization in the near absence of attention. Proceedings of the National Academy of Sciences of the United States of America Vol. 99, No. 14, 9596-9601, 2002.

DOI Google Scholar

[2]

L. Elazary,; L. Itti, Interesting objects are visually salient. Journal of Vision Vol. 8, No. 3, 3, 2008.

DOI Google Scholar

[3]

M.-M. Cheng,; F.-L. Zhang,; N. J. Mitra,; X. Huang,; S.-M. Hu, RepFinder: Finding approximately repeated scene elements for image editing. ACM Transactions on Graphics Vol. 29, No. 4, Article No. 83, 2010.

DOI Google Scholar

[4]

H. S. Wu,; Y. S. Wang,; K. C. Feng,; T. T. Wong,; T. Y. Lee,; P. A. Heng, Resizing by symmetry-summarization. ACM Transactions on Graphics Vol. 29, No. 6, Article No. 159, 2010.

DOI Google Scholar

[5]

T. Chen,; M.-M. Cheng,; P. Tan,; A. Shamir,; S.-M. Hu, Sketch2photo: Internet image montage. ACM Transactions on Graphics Vol. 28, No. 5, Article No. 124, 2009.

DOI Google Scholar

[6]

C. Wu,; I. Lenz,; A. Saxena, Hierarchical semantic labeling for task-relevant RGB-D perception. In: Proceedings of the Robotics: Science and Systems, 2014.

DOI

[7]

A. Borji,; M.-M. Cheng,; Q. Hou,; H. Jiang,; J. Li, Salient object detection: A survey. Computational Visual Media Vol. 5, No. 2, 117-150, 2019.

DOI Google Scholar

[8]

Z. Bylinskii,; T. Judd,; A. Oliva,; A. Torralba,; F. Durand, What do different evaluation metrics tell us about saliency models? IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 41, No. 3, 740-757, 2019.

DOI Google Scholar

[9]

G. Li,; Y. Xie,; L. Lin,; Y. Yu, Instance-level salient object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2386-2395, 2017.

DOI

[10]

J. M. Wolfe,; T. S. Horowitz, What attributes guide the deployment of visual attention and how do they do it? Nature Reviews Neuroscience Vol. 5, No. 6, 495-501, 2004.

DOI Google Scholar

[11]

R. Desimone,; J. Duncan, Neural mechanisms of selective visual attention. Annual Review of Neuroscience Vol. 18, No. 1, 193-222, 1995.

DOI Google Scholar

[12]

S. K. Mannan,; C. Kennard,; M. Husain, The role of visual salience in directing eye movements in visual object agnosia. Current Biology Vol. 19, No. 6, R247-R248, 2009.

DOI Google Scholar

[13]

L. Itti,; C. Koch,; E. Niebur, A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 20, No. 11, 1254-1259, 1998.

DOI Google Scholar

[14]

L. Itti,; C. Koc, Computational modeling of visual attention. Nature Reviews Neuroscience Vol. 2, No. 3, 194-203, 2001.

DOI Google Scholar

[15]

M. M. Cheng,; N. J. Mitra,; X. L. Huang,; P. H. S. Torr,; S. M. Hu, Global contrast based salient region detection. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 3, 569-582, 2015.

DOI Google Scholar

[16]

H. Z. Jiang,; J. D. Wang,; Z. J. Yuan,; Y. Wu,; N. N. Zheng,; S. P. Li, Salient object detection: A discriminative regional feature integration approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2083-2090,2013.

DOI

[17]

W. Zhu,; S. Liang,; Y. Wei,; J. Sun, Saliency optimization from robust background detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2814-2821, 2014.

DOI

[18]

C. Rother,; V. Kolmogorov,; A. Blake “GrabCut”: Interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics Vol. 23, No. 3, 309-314, 2004.

DOI Google Scholar

[19]

Q. Hou,; M.-M. Cheng,; X. Hu,; A. Borji,; Z. Tu,; P. H. S. Torr, Deeply supervised salient object detection with short connections. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 41, No. 4, 815-828, 2019.

DOI Google Scholar

[20]

G. Li,; Y. Yu, Deep contrast learning for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 478-487, 2016.

DOI

[21]

L. Wang,; H. Lu,; X. Ruan,; M.-H. Yang, Deep networks for saliency detection via local estimation and global search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3183-3192, 2015.

DOI

[22]

J. Dai,; K. He,; J. Sun, Convolutional feature masking for joint object and stuff segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3992-4000, 2015.

DOI

[23]

B. Hariharan,; P. Arbeláez,; R. Girshick,; J. Malik, Simultaneous detection and segmentation. In: Computer Vision - ECCV 2014. Lecture Notes in Computer Science, Vol. 8695. D. Fleet,; T. Pajdla,; B. Schiele,; T. Tuytelaars, Eds. Springer Cham, 297-312, 2014.

DOI

[24]

B. Hariharan,; P. Arbelaez,; R. Girshick,; J. Malik, Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 447-456, 2015.

DOI

[25]

R. Girshick,; J. Donahue,; T. Darrell,; J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 580-587, 2014.

DOI

[26]

S. Q. Ren,; K. M. He,; R. Girshick,; J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 6, 1137-1149, 2017.

DOI Google Scholar

[27]

J. Dai,; Y. Li,; K. He,; J. Sun, R-FCN: Object detection via region-based fully convolutional networks. In: Proceedings of the Advances in Neural Information Processing Systems 29, 2016.

[28]

R. Girshick, Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 1440-1448, 2015.

DOI

[29]

K. M. He,; X. Y. Zhang,; S. Q. Ren,; J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 9, 1904-1916, 2015.

DOI Google Scholar

[30]

J. F. Dai,; K. M. He,; Y. Li,; S. Q. Ren,; J. Sun, Instance-sensitive fully convolutional networks. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9910. B. Leibe,; J. Matas,; N. Sebe,; M. Welling, Eds. Springer Cham, 534-549, 2016.

DOI

[31]

K. He,; G. Gkioxari,; P. Dollár,; R. Girshick, Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 2961-2969, 2017.

DOI

[32]

T.-Y. Lin,; P. Dollár,; R. B. Girshick,; K. He,; B. Hariharan,; S. J. Belongie, Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2117-2125, 2017.

DOI

[33]

Y. C. Wei,; X. D. Liang,; Y. P. Chen,; X. H. Shen,; M. M. Cheng,; J. S. Feng,; Y. Zhao,; S. Yan, STC: A simple to complex framework for weakly-supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 11, 2314-2320, 2017.

DOI Google Scholar

[34]

Q. B. Hou,; D. Massiceti,; P. K. Dokania,; Y. C. Wei,; M. M. Cheng,; P. H. S. Torr, Bottom-up top-down cues for weakly-supervised semantic segmentation. In: Energy Minimization Methods in Computer Vision and Pattern Recognition. Lecture Notes in Computer Science, Vol. 10746. M. Pelillo,; E. Hancock, Eds. Springer Cham, 263-277, 2018.

DOI

[35]

O. Russakovsky,; J. Deng,; H. Su,; J. Krause,; S. Satheesh,; S. Ma,; Z. Huang,; A. Karpathy,; A. Khosla,; M Bernstein,. et al. ImageNet large scale visual recognition challenge International Journal of Computer Vision Vol. 115, 211-252, 2015.

DOI Google Scholar

[36]

M. Everingham,; S. M. A. Eslami,; L. van Gool,; C. K. I. Williams,; J. Winn,; A. Zisserman, The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision Vol. 111, No. 1, 98-136, 2015.

DOI Google Scholar

[37]

J. M. Zhang,; S. Sclaroff,; Z. Lin,; X. H. Shen,; B. Price,; R. Mech, Unconstrained salient object detection via proposal subset optimization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5733-5742, 2016.

DOI

[38]

J. Pont-Tuset,; P. Arbelaez,; J. T. Barron,; F. Marques,; J. Malik, Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 1, 128-140, 2017.

DOI Google Scholar

[39]

W. Qi,; M. M. Cheng,; A. Borji,; H. C. Lu,; L. F. Bai, SaliencyRank: Two-stage manifold ranking for salient object detection. Computational Visual Media Vol. 1, No. 4, 309-320, 2015.

DOI Google Scholar

[40]

A. Borji,; M. M. Cheng,; H. Z. Jiang,; J. Li, Salient object detection: A benchmark. IEEE Transactions on Image Processing Vol. 24, No. 12, 5706-5722, 2015.

DOI Google Scholar

[41]

R. Achanta,; A. Shaji,; K. Smith,; A. Lucchi,; P. Fua,; S. Süsstrunk, SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 34, No. 11, 2274-2282, 2012.

DOI Google Scholar

[42]

P. F. Felzenszwalb,; D. P. Huttenlocher, Efficient graph-based image segmentation. International Journal of Computer Vision Vol. 59, No. 2, 167-181, 2004.

DOI Google Scholar

[43]

J. B. Shi,; J. Malik, Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 22, No. 8, 888-905, 2000.

DOI Google Scholar

[44]

J. D. Wang,; H. Z. Jiang,; Z. J. Yuan,; M. M. Cheng,; X. W. Hu,; N. N. Zheng, Salient object detection: A discriminative regional feature integration approach. International Journal of Computer Vision Vol. 123, No. 2, 251-268, 2017.

DOI Google Scholar

[45]

R. Zhao,; W. Ouyang,; H. Li,; X. Wang, Saliency detection by multi-context deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1265-1274, 2015.

DOI

[46]

G. Lee,; Y.-W. Tai,; J. Kim, Deep saliency with encoded low level distance map and high level features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 660-668, 2016.

DOI

[47]

G. Li,; Y. Yu, Visual saliency based on multiscale deep features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5455-5463, 2015.

[48]

D. G. Lowe, Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision Vol. 60, No. 2, 91-110, 2004.

DOI Google Scholar

[49]

H. Bay,; A. Ess,; T. Tuytelaars,; L. Van Gool, Speeded-up robust features (SURF). Computer Vision and Image Understanding Vol. 110, No. 3, 346-359, 2008.

DOI Google Scholar

[50]

N. Dalal; B. Triggs, Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1, 886-893, 2005.

[51]

P. Sermanet,; D. Eigen,; X. Zhang,; M. Mathieu,; R. Fergus,; Y. LeCun, Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229, 2013.

[52]

J. R. Uijlings,; K. E. Van De Sande,; T. Gevers,; A. W. Smeulders, Selective search for object recognition. International Journal of Computer Vision Vol. 104, No. 2, 154-171, 2013.

DOI Google Scholar

[53]

M.-M. Cheng,; Z. Zhang,; W.-Y. Lin,; P. Torr, BING: Binarized normed gradients for objectness estimation at 300fps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3286-3293, 2014.

DOI

[54]

P. O. Pinheiro,; R. Collobert,; P. Dollár, Learning to segment object candidates. In: Proceedings of the Advances in Neural Information Processing Systems 28, 2015.

[55]

P. Arbeláez,; M. Maire,; C. Fowlkes,; J. Malik, Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 33, No. 5, 898-916, 2011.

DOI Google Scholar

[56]

Y. Li,; H. Qi,; J. Dai,; X. Ji,; Y. Wei, Fully convolutional instance-aware semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2359-2367, 2017.

DOI

[57]

K. He,; X. Zhang,; S. Ren,; J. Sun, Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770-778, 2016.

DOI

[58]

T.-Y. Lin,; P. Goyal,; R. Girshick,; K. He,; P. Dollár, Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, 2980-2988, 2017.

DOI

[59]

J. Yosinski,; J. Clune,; A. Nguyen,; T. Fuchs,; H. Lipson, Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579, 2015.

[60]

H. Zhao,; J. Shi,; X. Qi,; X. Wang,; J. Jia, Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2881-2890, 2017.

DOI

[61]

M. Abadi,; A. Agarwal,; P. Barham,; E. Brevdo,; Z. Chen,; C. Citro,; G. S. Corrado,; A. Davis,; J. Dean,; M. Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016.

[62]

T. Y. Lin,; M. Maire,; S. Belongie,; J. Hays,; P. Perona,; D. Ramanan,; P. Dollár,; C. L. Zitnick, Microsoft COCO: Common objects in context. In: Computer Vision - ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. D. Fleet,; T. Pajdla,; B. Schiele,; T. Tuytelaars, Eds. Springer Cham, 740-755, 2014.

DOI

[63]

A. G. Howard,; M. Zhu,; B. Chen,; D. Kalenichenko,; W. Wang,; T. Weyand,; M. Andreetto,; H. Adam, Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.

[64]

K. Simonyan,; A. Zisserman, Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[65]

D. P. Fan,; M. M. Cheng,; J. J. Liu,; S. H. Gao,; Q. B. Hou,; A. Borji, Salient objects in clutter: Bringing salient object detection to the foreground. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11219. V. Ferrari,; M. Hebert,; C. Sminchisescu,; Y. Weiss, Eds. Springer Cham, 196-212, 2018.

[66]

N. Liu,; J. Han, DHSNet: Deep hierarchical saliency network for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 678-686, 2016.

DOI

[67]

A. Kolesnikov,; C. H. Lampert, Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9908. B. Leibe,; J. Matas,; N. Sebe,; M. Welling, Eds. Springer Cham, 695-711, 2016.

DOI

[68]

Y. C. Wei,; J. S. Feng,; X. D. Liang,; M. M. Cheng,; Y. Zhao,; S. C. Yan, Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6488-6496, 2017.

DOI

[69]

L. C. Chen,; G. Papandreou,; I. Kokkinos,; K. Murphy,; A. L. Yuille, DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 40, No. 4, 834-848, 2018.

DOI Google Scholar

[70]

J. M. Zhang,; Z. Lin,; J. Brandt,; X. H. Shen,; S. Sclaroff, Top-down neural attention by excitation backprop. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9908. B. Leibe,; J. Matas,; N. Sebe,; M. Welling, Eds. Springer Cham, 543-559, 2016.

DOI

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 16 September 2019

Revised: 16 September 2019

Accepted: 12 April 2020

Published: 10 June 2020

Issue date: June 2020

Copyright

Acknowledgements

This research was supported by National Natural Science Foundation of China (61521002, 61572264, 61620106008), the National Youth Talent Support Program, Tianjin Natural Science Foundation (17JCJQJC43700, 18ZXZNGX00110), and the Fundamental Research Funds for the Central Universities (Nankai University, No. 63191501).

Rights and permissions

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www. editorialmanager.com/cvmj.