Journal Home > Volume 27 , Issue 5

To address the scale variance and uneven distribution of objects in scenarios of object-counting tasks, an algorithm called Refinement Network (RefNet) is exploited. The proposed top-down scheme sequentially aggregates multiscale features, which are laterally connected with low-level information. Trained by a multiresolution density regression loss, a set of intermediate-density maps are estimated on each scale in a multiscale feature pyramid, and the detailed information of the density map is gradually added through coarse-to-fine granular refinement progress to predict the final density map. We evaluate our RefNet on three crowd-counting benchmark datasets, namely, ShanghaiTech, UCF _CC _50, and UCSD, and our method achieves competitive performances on the mean absolute error and root mean squared error compared to the state-of-the-art approaches. We further extend our RefNet to cell counting, illustrating its effectiveness on relative counting tasks.


menu
Abstract
Full text
Outline
About this article

Object Counting Using a Refinement Network

Show Author's information Lehan Sun( )Junjie Ma( )Liping Jing
School of Science, Beijing Jiaotong University, Beijing 100044, China
Department of Computer Science and Technology, and Beijing National Research Center for Information Science and Technology (BNRist), Tsinghua University, Beijing 100084, China
School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China

Abstract

To address the scale variance and uneven distribution of objects in scenarios of object-counting tasks, an algorithm called Refinement Network (RefNet) is exploited. The proposed top-down scheme sequentially aggregates multiscale features, which are laterally connected with low-level information. Trained by a multiresolution density regression loss, a set of intermediate-density maps are estimated on each scale in a multiscale feature pyramid, and the detailed information of the density map is gradually added through coarse-to-fine granular refinement progress to predict the final density map. We evaluate our RefNet on three crowd-counting benchmark datasets, namely, ShanghaiTech, UCF _CC _50, and UCSD, and our method achieves competitive performances on the mean absolute error and root mean squared error compared to the state-of-the-art approaches. We further extend our RefNet to cell counting, illustrating its effectiveness on relative counting tasks.

Keywords: object counting, Refinement Network (RefNet), scale variation, uneven distribution

References(38)

[1]
T. Li, H. Chang, M. Wang, B. B. Ni, R. C. Hong, and S. C. Yan, Crowded scene analysis: A survey, IEEE Trans. Circuits Syst. Video Technol., vol. 25, no. 3, pp. 367–386, 2015.
[2]
Y. M. Rao, J. W. Lu, J. Lin, and J. Zhou, Runtime network routing for efficient image classification, IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 10, pp. 2291–2304, 2019.
[3]
H. Liu, J. W. Lu, J. J. Feng, and J. Zhou, Two-stream transformer networks for video-based face alignment, IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 11, pp. 2546–2554, 2018.
[4]
L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs, IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, 2018.
[5]
V. A. Sindagi and V. M. Patel, A survey of recent advances in CNN-based single image crowd counting and density estimation, Pattern Recognit. Lett., vol. 107, pp. 3–16, 2018.
[6]
J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, in Proc. 2015 IEEE Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 3431–3440.
DOI
[7]
Y. Y. Zhang, D. S. Zhou, S. Q. Chen, S. H. Gao, and Y. Ma, Single-image crowd counting via multi-column convolutional neural network, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 589–597.
DOI
[8]
D. B. Sam, S. Surya, and R. V. Babu, Switching convolutional neural network for crowd counting, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 4031–4039.
DOI
[9]
V. A. Sindagi and V. M. Patel, Generating high-quality crowd density maps using contextual pyramid CNNs, in Proc. 2017 IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 1879–1888.
DOI
[10]
D. Deb and J. Ventura, An aggregated multicolumn dilated convolution network for perspective-free counting, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 2018, pp. 195–204.
DOI
[11]
M. Hossain, M. Hosseinzadeh, O. Chanda, and Y. Wang, Crowd counting using scale-aware attention networks, in Proc. 2019 IEEE Winter Conf. Applications of Computer Vision, Waikoloa, HI, USA, 2019, pp. 1280–1288.
DOI
[12]
Y. H. Li, X. F. Zhang, and D. M. Chen, CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 1091–1100.
DOI
[13]
F. Yu and V. Koltun, Multi-scale context aggregation by dilated convolutions, in Proc. of the 4th Int. Conf. Learning Representations, arXiv preprint arXiv: 1511.07122, 2015.
[14]
K. M. He, X. Y. Zhang, S. Q. Ren, and J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 9, pp. 1904–1916, 2015.
[15]
L. C. Chen, G. Papandreou, F. Schroff, and H. Adam, Rethinking atrous convolution for semantic image segmentation, arXiv preprint arXiv: 1706.05587, 2017.
[16]
J. J. Ma, Y. P. Dai, and Y. P. Tan, Atrous convolutions spatial pyramid network for crowd counting and density estimation, Neurocomputing, vol. 350, pp. 91–101, 2019.
[17]
Z. Shen, Y. Xu, B. B. Ni, M. S. Wang, J. G. Hu, and X. K. Yang, Crowd counting via adversarial cross-scale consistency pursuit, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 5245–5254.
DOI
[18]
X. K. Cao, Z. P. Wang, Y. Y. Zhao, and F. Su, Scale aggregation network for accurate and efficient crowd counting, in Proc. 15th European Conf. Computer Vision, Munich, Germany, 2018, pp. 757–773.
DOI
[19]
T. Y. Lin, P. Dollár, R. Girshick, K. M. He, B. Hariharan, and S. Belongie, Feature pyramid networks for object detection, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 936–944.
DOI
[20]
V. A. Sindagi and V. M. Patel, CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting, in Proc. 14th IEEE Int. Conf. Advanced Video and Signal Based Surveillance, Lecce, Italy, 2017, pp. 1–6.
DOI
[21]
F. Xiong, X. J. Shi, and D. Y. Yeung, Spatiotemporal modeling for crowd counting in videos, in Proc. 2017 IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 5161–5169.
DOI
[22]
K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, in Proc. of the 3rd Int. Conf. Learning Representations, arXiv preprint arXiv: 1409.1556, 2014.
[23]
X. L. Liu, J. van de Weijer, and A. D. Bagdanov, Leveraging unlabeled data for crowd counting by learning to rank, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 7661–7669.
DOI
[24]
V. Ranjan, H. Le, and M. Hoai, Iterative crowd counting, in Proc. 15th European Conf. Computer Vision, Munich, Germany, 2018, pp. 278–293.
DOI
[25]
G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, Densely connected convolutional networks, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 2261–2269.
DOI
[26]
J. Ma, Y. Dai, Z. Jia, and K. Hirota, Multi-resolution feature pyramid network for crowd counting, in Proc. Int. WorkshopAdv. Comput. Intell. Intell. Inf., Chengdu, China, 2019, pp. 6–11.
[27]
A. B. Chan, Z. S. J. Liang, and N. Vasconcelos, Privacy preserving crowd monitoring: Counting people without people models or tracking, in Proc. 2008 IEEE Conf. Computer Vision and Pattern Recognition, Anchorage, AK, USA, 2008, pp. 1–7.
DOI
[28]
H. Idrees, I. Saleemi, C. Seibert, and M. Shah, Multi-source multi-scale counting in extremely dense crowd images, in Proc. 2013 IEEE Conf. Computer Vision and Pattern Recognition, Portland, OR, USA, 2013, pp. 2547–2554.
DOI
[29]
C. Zhang, H. S. Li, X. G. Wang, and X. K. Yang, Cross-scene crowd counting via deep convolutional neural networks, in Proc. 2015 IEEE Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 833–841.
[30]
D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, in Proc. of the 3rd Int. Conf. Learning Representations, arXiv preprint arXiv: 1412.6980, 2014.
[31]
S. Y. Huang, X. Li, Z. F. Zhang, F. Wu, S. H. Gao, R. R. Ji, and J. W. Han, Body structure aware deep crowd counting, IEEE Trans. Image Process., vol. 27, no. 3, pp. 1049–1059, 2018.
[32]
D. Oñoro-Rubio and R. J. López-Sastre, Towards perspective-free object counting with deep learning, in Proc. 14th European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 615–629.
DOI
[33]
D. Babu Sam, N. N. Sajjan, R. Venkatesh Babu, and M. Srinivasan, Divide and grow: Capturing huge diversity in crowd images with incrementally growing CNN, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 3618–3626.
DOI
[34]
S. H. Zhang, G. H. Wu, J. P. Costeira, and J. M. F. Moura, FCN-rLSTM: Deep spatio-temporal neural networks for vehicle counting in city cameras, in Proc. 2017 IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 3687–3696.
DOI
[35]
J. J. Tie, X. J. Lei, and Y. Pan, Metabolite-disease association prediction algorithm combining DeepWalk and random forest, Tsinghua Science and Technology, vol. 27, no. 1, pp. 58–67, 2022.
[36]
Y. H. Yu and J. Li, Residuals-based deep least square support vector machine with redundancy test based model selection to predict time series, Tsinghua Science and Technology, vol. 24, no. 6, pp. 706–715, 2019.
[37]
J. L. Hu, J. W. Lu, Y. P. Tan, and J. Zhou, Deep transfer metric learning, IEEE Trans. Image Process., vol. 25, no. 12, pp. 5576–5588, 2016.
[38]
J. L. Hu, J. W. Lu, and Y. P. Tan, Sharable and individual multi-view metric learning, IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 9, pp. 2281–2288, 2018.
Publication history
Copyright
Rights and permissions

Publication history

Received: 03 November 2021
Revised: 22 December 2021
Accepted: 23 December 2021
Published: 17 March 2022
Issue date: October 2022

Copyright

© The author(s) 2022.

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return