Journal Home > Volume 6 , Issue 2

In this paper, we consider salient instance segmentation. As well as producing bounding boxes, our network also outputs high-quality instance-level segments as initial selections to indicate the regions of interest. Taking into account the category-independent property of each target, we design a single stage salient instance segmentation framework, with a novel segmentation branch. Our new branch regards not only local context inside each detection window but also the surrounding context, enabling us to distinguish instances in the same scope even with partial occlusion. Our network is end-to-end trainable and is fast (running at 40 fps for images with resolution 320×320). We evaluate our approach on a publicly available benchmark and show that it outperforms alternative solutions. We also provide a thorough analysis of our design choices to help readers better understand the function of each part of our network. Source code can be found at https://github.com/RuochenFan/S4Net.


menu
Abstract
Full text
Outline
About this article

S4Net: Single stage salient-instance segmentation

Show Author's information Ruochen Fan1Ming-Ming Cheng2Qibin Hou2Tai-Jiang Mu1Jingdong Wang3Shi-Min Hu1( )
BNRist, Tsinghua University, Beijing 100086, China.
Nankai University, Tianjin 300071, China.
MSRA, Beijing 100086, China.

Abstract

In this paper, we consider salient instance segmentation. As well as producing bounding boxes, our network also outputs high-quality instance-level segments as initial selections to indicate the regions of interest. Taking into account the category-independent property of each target, we design a single stage salient instance segmentation framework, with a novel segmentation branch. Our new branch regards not only local context inside each detection window but also the surrounding context, enabling us to distinguish instances in the same scope even with partial occlusion. Our network is end-to-end trainable and is fast (running at 40 fps for images with resolution 320×320). We evaluate our approach on a publicly available benchmark and show that it outperforms alternative solutions. We also provide a thorough analysis of our design choices to help readers better understand the function of each part of our network. Source code can be found at https://github.com/RuochenFan/S4Net.

Keywords: salient-instance segmentation, salient object detection, single stage, region-of-interest masking

References(70)

[1]
F. F. Li,; R. VanRullen,; C. Koch,; P. Perona, Rapid natural scene categorization in the near absence of attention. Proceedings of the National Academy of Sciences of the United States of America Vol. 99, No. 14, 9596-9601, 2002.
[2]
L. Elazary,; L. Itti, Interesting objects are visually salient. Journal of Vision Vol. 8, No. 3, 3, 2008.
[3]
M.-M. Cheng,; F.-L. Zhang,; N. J. Mitra,; X. Huang,; S.-M. Hu, RepFinder: Finding approximately repeated scene elements for image editing. ACM Transactions on Graphics Vol. 29, No. 4, Article No. 83, 2010.
[4]
H. S. Wu,; Y. S. Wang,; K. C. Feng,; T. T. Wong,; T. Y. Lee,; P. A. Heng, Resizing by symmetry-summarization. ACM Transactions on Graphics Vol. 29, No. 6, Article No. 159, 2010.
[5]
T. Chen,; M.-M. Cheng,; P. Tan,; A. Shamir,; S.-M. Hu, Sketch2photo: Internet image montage. ACM Transactions on Graphics Vol. 28, No. 5, Article No. 124, 2009.
[6]
C. Wu,; I. Lenz,; A. Saxena, Hierarchical semantic labeling for task-relevant RGB-D perception. In: Proceedings of the Robotics: Science and Systems, 2014.
DOI
[7]
A. Borji,; M.-M. Cheng,; Q. Hou,; H. Jiang,; J. Li, Salient object detection: A survey. Computational Visual Media Vol. 5, No. 2, 117-150, 2019.
[8]
Z. Bylinskii,; T. Judd,; A. Oliva,; A. Torralba,; F. Durand, What do different evaluation metrics tell us about saliency models? IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 41, No. 3, 740-757, 2019.
[9]
G. Li,; Y. Xie,; L. Lin,; Y. Yu, Instance-level salient object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2386-2395, 2017.
DOI
[10]
J. M. Wolfe,; T. S. Horowitz, What attributes guide the deployment of visual attention and how do they do it? Nature Reviews Neuroscience Vol. 5, No. 6, 495-501, 2004.
[11]
R. Desimone,; J. Duncan, Neural mechanisms of selective visual attention. Annual Review of Neuroscience Vol. 18, No. 1, 193-222, 1995.
[12]
S. K. Mannan,; C. Kennard,; M. Husain, The role of visual salience in directing eye movements in visual object agnosia. Current Biology Vol. 19, No. 6, R247-R248, 2009.
[13]
L. Itti,; C. Koch,; E. Niebur, A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 20, No. 11, 1254-1259, 1998.
[14]
L. Itti,; C. Koc, Computational modeling of visual attention. Nature Reviews Neuroscience Vol. 2, No. 3, 194-203, 2001.
[15]
M. M. Cheng,; N. J. Mitra,; X. L. Huang,; P. H. S. Torr,; S. M. Hu, Global contrast based salient region detection. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 3, 569-582, 2015.
[16]
H. Z. Jiang,; J. D. Wang,; Z. J. Yuan,; Y. Wu,; N. N. Zheng,; S. P. Li, Salient object detection: A discriminative regional feature integration approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2083-2090,2013.
DOI
[17]
W. Zhu,; S. Liang,; Y. Wei,; J. Sun, Saliency optimization from robust background detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2814-2821, 2014.
DOI
[18]
C. Rother,; V. Kolmogorov,; A. Blake “GrabCut”: Interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics Vol. 23, No. 3, 309-314, 2004.
[19]
Q. Hou,; M.-M. Cheng,; X. Hu,; A. Borji,; Z. Tu,; P. H. S. Torr, Deeply supervised salient object detection with short connections. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 41, No. 4, 815-828, 2019.
[20]
G. Li,; Y. Yu, Deep contrast learning for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 478-487, 2016.
DOI
[21]
L. Wang,; H. Lu,; X. Ruan,; M.-H. Yang, Deep networks for saliency detection via local estimation and global search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3183-3192, 2015.
DOI
[22]
J. Dai,; K. He,; J. Sun, Convolutional feature masking for joint object and stuff segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3992-4000, 2015.
DOI
[23]
B. Hariharan,; P. Arbeláez,; R. Girshick,; J. Malik, Simultaneous detection and segmentation. In: Computer Vision - ECCV 2014. Lecture Notes in Computer Science, Vol. 8695. D. Fleet,; T. Pajdla,; B. Schiele,; T. Tuytelaars, Eds. Springer Cham, 297-312, 2014.
DOI
[24]
B. Hariharan,; P. Arbelaez,; R. Girshick,; J. Malik, Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 447-456, 2015.
DOI
[25]
R. Girshick,; J. Donahue,; T. Darrell,; J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 580-587, 2014.
DOI
[26]
S. Q. Ren,; K. M. He,; R. Girshick,; J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 6, 1137-1149, 2017.
[27]
J. Dai,; Y. Li,; K. He,; J. Sun, R-FCN: Object detection via region-based fully convolutional networks. In: Proceedings of the Advances in Neural Information Processing Systems 29, 2016.
[28]
R. Girshick, Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 1440-1448, 2015.
DOI
[29]
K. M. He,; X. Y. Zhang,; S. Q. Ren,; J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 9, 1904-1916, 2015.
[30]
J. F. Dai,; K. M. He,; Y. Li,; S. Q. Ren,; J. Sun, Instance-sensitive fully convolutional networks. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9910. B. Leibe,; J. Matas,; N. Sebe,; M. Welling, Eds. Springer Cham, 534-549, 2016.
DOI
[31]
K. He,; G. Gkioxari,; P. Dollár,; R. Girshick, Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 2961-2969, 2017.
DOI
[32]
T.-Y. Lin,; P. Dollár,; R. B. Girshick,; K. He,; B. Hariharan,; S. J. Belongie, Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2117-2125, 2017.
DOI
[33]
Y. C. Wei,; X. D. Liang,; Y. P. Chen,; X. H. Shen,; M. M. Cheng,; J. S. Feng,; Y. Zhao,; S. Yan, STC: A simple to complex framework for weakly-supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 11, 2314-2320, 2017.
[34]
Q. B. Hou,; D. Massiceti,; P. K. Dokania,; Y. C. Wei,; M. M. Cheng,; P. H. S. Torr, Bottom-up top-down cues for weakly-supervised semantic segmentation. In: Energy Minimization Methods in Computer Vision and Pattern Recognition. Lecture Notes in Computer Science, Vol. 10746. M. Pelillo,; E. Hancock, Eds. Springer Cham, 263-277, 2018.
DOI
[35]
O. Russakovsky,; J. Deng,; H. Su,; J. Krause,; S. Satheesh,; S. Ma,; Z. Huang,; A. Karpathy,; A. Khosla,; M Bernstein,. et al. ImageNet large scale visual recognition challenge International Journal of Computer Vision Vol. 115, 211-252, 2015.
[36]
M. Everingham,; S. M. A. Eslami,; L. van Gool,; C. K. I. Williams,; J. Winn,; A. Zisserman, The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision Vol. 111, No. 1, 98-136, 2015.
[37]
J. M. Zhang,; S. Sclaroff,; Z. Lin,; X. H. Shen,; B. Price,; R. Mech, Unconstrained salient object detection via proposal subset optimization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5733-5742, 2016.
DOI
[38]
J. Pont-Tuset,; P. Arbelaez,; J. T. Barron,; F. Marques,; J. Malik, Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 1, 128-140, 2017.
[39]
W. Qi,; M. M. Cheng,; A. Borji,; H. C. Lu,; L. F. Bai, SaliencyRank: Two-stage manifold ranking for salient object detection. Computational Visual Media Vol. 1, No. 4, 309-320, 2015.
[40]
A. Borji,; M. M. Cheng,; H. Z. Jiang,; J. Li, Salient object detection: A benchmark. IEEE Transactions on Image Processing Vol. 24, No. 12, 5706-5722, 2015.
[41]
R. Achanta,; A. Shaji,; K. Smith,; A. Lucchi,; P. Fua,; S. Süsstrunk, SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 34, No. 11, 2274-2282, 2012.
[42]
P. F. Felzenszwalb,; D. P. Huttenlocher, Efficient graph-based image segmentation. International Journal of Computer Vision Vol. 59, No. 2, 167-181, 2004.
[43]
J. B. Shi,; J. Malik, Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 22, No. 8, 888-905, 2000.
[44]
J. D. Wang,; H. Z. Jiang,; Z. J. Yuan,; M. M. Cheng,; X. W. Hu,; N. N. Zheng, Salient object detection: A discriminative regional feature integration approach. International Journal of Computer Vision Vol. 123, No. 2, 251-268, 2017.
[45]
R. Zhao,; W. Ouyang,; H. Li,; X. Wang, Saliency detection by multi-context deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1265-1274, 2015.
DOI
[46]
G. Lee,; Y.-W. Tai,; J. Kim, Deep saliency with encoded low level distance map and high level features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 660-668, 2016.
DOI
[47]
G. Li,; Y. Yu, Visual saliency based on multiscale deep features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5455-5463, 2015.
[48]
D. G. Lowe, Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision Vol. 60, No. 2, 91-110, 2004.
[49]
H. Bay,; A. Ess,; T. Tuytelaars,; L. Van Gool, Speeded-up robust features (SURF). Computer Vision and Image Understanding Vol. 110, No. 3, 346-359, 2008.
[50]
N. Dalal; B. Triggs, Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1, 886-893, 2005.
[51]
P. Sermanet,; D. Eigen,; X. Zhang,; M. Mathieu,; R. Fergus,; Y. LeCun, Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229, 2013.
[52]
J. R. Uijlings,; K. E. Van De Sande,; T. Gevers,; A. W. Smeulders, Selective search for object recognition. International Journal of Computer Vision Vol. 104, No. 2, 154-171, 2013.
[53]
M.-M. Cheng,; Z. Zhang,; W.-Y. Lin,; P. Torr, BING: Binarized normed gradients for objectness estimation at 300fps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3286-3293, 2014.
DOI
[54]
P. O. Pinheiro,; R. Collobert,; P. Dollár, Learning to segment object candidates. In: Proceedings of the Advances in Neural Information Processing Systems 28, 2015.
[55]
P. Arbeláez,; M. Maire,; C. Fowlkes,; J. Malik, Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 33, No. 5, 898-916, 2011.
[56]
Y. Li,; H. Qi,; J. Dai,; X. Ji,; Y. Wei, Fully convolutional instance-aware semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2359-2367, 2017.
DOI
[57]
K. He,; X. Zhang,; S. Ren,; J. Sun, Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770-778, 2016.
DOI
[58]
T.-Y. Lin,; P. Goyal,; R. Girshick,; K. He,; P. Dollár, Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, 2980-2988, 2017.
DOI
[59]
J. Yosinski,; J. Clune,; A. Nguyen,; T. Fuchs,; H. Lipson, Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579, 2015.
[60]
H. Zhao,; J. Shi,; X. Qi,; X. Wang,; J. Jia, Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2881-2890, 2017.
DOI
[61]
M. Abadi,; A. Agarwal,; P. Barham,; E. Brevdo,; Z. Chen,; C. Citro,; G. S. Corrado,; A. Davis,; J. Dean,; M. Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016.
[62]
T. Y. Lin,; M. Maire,; S. Belongie,; J. Hays,; P. Perona,; D. Ramanan,; P. Dollár,; C. L. Zitnick, Microsoft COCO: Common objects in context. In: Computer Vision - ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. D. Fleet,; T. Pajdla,; B. Schiele,; T. Tuytelaars, Eds. Springer Cham, 740-755, 2014.
DOI
[63]
A. G. Howard,; M. Zhu,; B. Chen,; D. Kalenichenko,; W. Wang,; T. Weyand,; M. Andreetto,; H. Adam, Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
[64]
K. Simonyan,; A. Zisserman, Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[65]
D. P. Fan,; M. M. Cheng,; J. J. Liu,; S. H. Gao,; Q. B. Hou,; A. Borji, Salient objects in clutter: Bringing salient object detection to the foreground. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11219. V. Ferrari,; M. Hebert,; C. Sminchisescu,; Y. Weiss, Eds. Springer Cham, 196-212, 2018.
[66]
N. Liu,; J. Han, DHSNet: Deep hierarchical saliency network for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 678-686, 2016.
DOI
[67]
A. Kolesnikov,; C. H. Lampert, Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9908. B. Leibe,; J. Matas,; N. Sebe,; M. Welling, Eds. Springer Cham, 695-711, 2016.
DOI
[68]
Y. C. Wei,; J. S. Feng,; X. D. Liang,; M. M. Cheng,; Y. Zhao,; S. C. Yan, Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6488-6496, 2017.
DOI
[69]
L. C. Chen,; G. Papandreou,; I. Kokkinos,; K. Murphy,; A. L. Yuille, DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 40, No. 4, 834-848, 2018.
[70]
J. M. Zhang,; Z. Lin,; J. Brandt,; X. H. Shen,; S. Sclaroff, Top-down neural attention by excitation backprop. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9908. B. Leibe,; J. Matas,; N. Sebe,; M. Welling, Eds. Springer Cham, 543-559, 2016.
DOI
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 16 September 2019
Revised: 16 September 2019
Accepted: 12 April 2020
Published: 10 June 2020
Issue date: June 2020

Copyright

© The Author(s) 2020

Acknowledgements

This research was supported by National Natural Science Foundation of China (61521002, 61572264, 61620106008), the National Youth Talent Support Program, Tianjin Natural Science Foundation (17JCJQJC43700, 18ZXZNGX00110), and the Fundamental Research Funds for the Central Universities (Nankai University, No. 63191501).

Rights and permissions

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www. editorialmanager.com/cvmj.

Return