Object Counting Using a Refinement Network

Lehan Sun; Junjie Ma; Liping Jing

doi:10.26599/TST.2021.9010097

Tsinghua Science and Technology 2022, 27(5): 869-879 https://doi.org/10.26599/TST.2021.9010097

Open Access | Issue | Published: 17 March 2022

Object Counting Using a Refinement Network

Show Author's Information Hide Author's Information Lehan Sun(

), Junjie Ma(

), Liping Jing

School of Science, Beijing Jiaotong University, Beijing 100044, China

Department of Computer Science and Technology, and Beijing National Research Center for Information Science and Technology (BNRist), Tsinghua University, Beijing 100084, China

School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China

Keywords:

object counting, Refinement Network (RefNet), scale variation, uneven distribution

Cite this article:

Sun L, Ma J, Jing L. Object Counting Using a Refinement Network. Tsinghua Science and Technology, 2022, 27(5): 869-879. https://doi.org/10.26599/TST.2021.9010097

Download citation

EndNote(RIS)

BibTeX

478

Views

Downloads

Citations

Crossref

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

To address the scale variance and uneven distribution of objects in scenarios of object-counting tasks, an algorithm called Refinement Network (RefNet) is exploited. The proposed top-down scheme sequentially aggregates multiscale features, which are laterally connected with low-level information. Trained by a multiresolution density regression loss, a set of intermediate-density maps are estimated on each scale in a multiscale feature pyramid, and the detailed information of the density map is gradually added through coarse-to-fine granular refinement progress to predict the final density map. We evaluate our RefNet on three crowd-counting benchmark datasets, namely, ShanghaiTech, UCF $_$ CC $_$ 50, and UCSD, and our method achieves competitive performances on the mean absolute error and root mean squared error compared to the state-of-the-art approaches. We further extend our RefNet to cell counting, illustrating its effectiveness on relative counting tasks.

Full text

Abstract

Full text

Outline

About this article

Object Counting Using a Refinement Network

Show Author's information Hide Author's Information Lehan Sun(

), Junjie Ma(

), Liping Jing

School of Science, Beijing Jiaotong University, Beijing 100044, China

Department of Computer Science and Technology, and Beijing National Research Center for Information Science and Technology (BNRist), Tsinghua University, Beijing 100084, China

School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China

Abstract

Keywords: object counting, Refinement Network (RefNet), scale variation, uneven distribution

References(38)

[1]

T. Li, H. Chang, M. Wang, B. B. Ni, R. C. Hong, and S. C. Yan, Crowded scene analysis: A survey, IEEE Trans. Circuits Syst. Video Technol., vol. 25, no. 3, pp. 367–386, 2015.

DOI Google Scholar

[2]

Y. M. Rao, J. W. Lu, J. Lin, and J. Zhou, Runtime network routing for efficient image classification, IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 10, pp. 2291–2304, 2019.

DOI Google Scholar

[3]

H. Liu, J. W. Lu, J. J. Feng, and J. Zhou, Two-stream transformer networks for video-based face alignment, IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 11, pp. 2546–2554, 2018.

DOI Google Scholar

[4]

L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs, IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, 2018.

DOI Google Scholar

[5]

V. A. Sindagi and V. M. Patel, A survey of recent advances in CNN-based single image crowd counting and density estimation, Pattern Recognit. Lett., vol. 107, pp. 3–16, 2018.

DOI Google Scholar

[6]

J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, in Proc. 2015 IEEE Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 3431–3440.

DOI

[7]

Y. Y. Zhang, D. S. Zhou, S. Q. Chen, S. H. Gao, and Y. Ma, Single-image crowd counting via multi-column convolutional neural network, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 589–597.

DOI

[8]

D. B. Sam, S. Surya, and R. V. Babu, Switching convolutional neural network for crowd counting, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 4031–4039.

DOI

[9]

V. A. Sindagi and V. M. Patel, Generating high-quality crowd density maps using contextual pyramid CNNs, in Proc. 2017 IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 1879–1888.

DOI

[10]

D. Deb and J. Ventura, An aggregated multicolumn dilated convolution network for perspective-free counting, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 2018, pp. 195–204.

DOI

[11]

M. Hossain, M. Hosseinzadeh, O. Chanda, and Y. Wang, Crowd counting using scale-aware attention networks, in Proc. 2019 IEEE Winter Conf. Applications of Computer Vision, Waikoloa, HI, USA, 2019, pp. 1280–1288.

DOI

[12]

Y. H. Li, X. F. Zhang, and D. M. Chen, CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 1091–1100.

DOI

[13]

F. Yu and V. Koltun, Multi-scale context aggregation by dilated convolutions, in Proc. of the 4th Int. Conf. Learning Representations, arXiv preprint arXiv: 1511.07122, 2015.

[14]

K. M. He, X. Y. Zhang, S. Q. Ren, and J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 9, pp. 1904–1916, 2015.

DOI Google Scholar

[15]

L. C. Chen, G. Papandreou, F. Schroff, and H. Adam, Rethinking atrous convolution for semantic image segmentation, arXiv preprint arXiv: 1706.05587, 2017.

Google Scholar

[16]

J. J. Ma, Y. P. Dai, and Y. P. Tan, Atrous convolutions spatial pyramid network for crowd counting and density estimation, Neurocomputing, vol. 350, pp. 91–101, 2019.

DOI Google Scholar

[17]

Z. Shen, Y. Xu, B. B. Ni, M. S. Wang, J. G. Hu, and X. K. Yang, Crowd counting via adversarial cross-scale consistency pursuit, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 5245–5254.

DOI

[18]

X. K. Cao, Z. P. Wang, Y. Y. Zhao, and F. Su, Scale aggregation network for accurate and efficient crowd counting, in Proc. 15th European Conf. Computer Vision, Munich, Germany, 2018, pp. 757–773.

DOI

[19]

T. Y. Lin, P. Dollár, R. Girshick, K. M. He, B. Hariharan, and S. Belongie, Feature pyramid networks for object detection, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 936–944.

DOI

[20]

V. A. Sindagi and V. M. Patel, CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting, in Proc. 14th IEEE Int. Conf. Advanced Video and Signal Based Surveillance, Lecce, Italy, 2017, pp. 1–6.

DOI

[21]

F. Xiong, X. J. Shi, and D. Y. Yeung, Spatiotemporal modeling for crowd counting in videos, in Proc. 2017 IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 5161–5169.

DOI

[22]

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, in Proc. of the 3rd Int. Conf. Learning Representations, arXiv preprint arXiv: 1409.1556, 2014.

[23]

X. L. Liu, J. van de Weijer, and A. D. Bagdanov, Leveraging unlabeled data for crowd counting by learning to rank, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 7661–7669.

DOI

[24]

V. Ranjan, H. Le, and M. Hoai, Iterative crowd counting, in Proc. 15th European Conf. Computer Vision, Munich, Germany, 2018, pp. 278–293.

DOI

[25]

G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, Densely connected convolutional networks, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 2261–2269.

DOI

[26]

J. Ma, Y. Dai, Z. Jia, and K. Hirota, Multi-resolution feature pyramid network for crowd counting, in Proc. Int. WorkshopAdv. Comput. Intell. Intell. Inf., Chengdu, China, 2019, pp. 6–11.

[27]

A. B. Chan, Z. S. J. Liang, and N. Vasconcelos, Privacy preserving crowd monitoring: Counting people without people models or tracking, in Proc. 2008 IEEE Conf. Computer Vision and Pattern Recognition, Anchorage, AK, USA, 2008, pp. 1–7.

DOI

[28]

H. Idrees, I. Saleemi, C. Seibert, and M. Shah, Multi-source multi-scale counting in extremely dense crowd images, in Proc. 2013 IEEE Conf. Computer Vision and Pattern Recognition, Portland, OR, USA, 2013, pp. 2547–2554.

DOI

[29]

C. Zhang, H. S. Li, X. G. Wang, and X. K. Yang, Cross-scene crowd counting via deep convolutional neural networks, in Proc. 2015 IEEE Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 833–841.

[30]

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, in Proc. of the 3rd Int. Conf. Learning Representations, arXiv preprint arXiv: 1412.6980, 2014.

[31]

S. Y. Huang, X. Li, Z. F. Zhang, F. Wu, S. H. Gao, R. R. Ji, and J. W. Han, Body structure aware deep crowd counting, IEEE Trans. Image Process., vol. 27, no. 3, pp. 1049–1059, 2018.

DOI Google Scholar

[32]

D. Oñoro-Rubio and R. J. López-Sastre, Towards perspective-free object counting with deep learning, in Proc. 14th European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 615–629.

DOI

[33]

D. Babu Sam, N. N. Sajjan, R. Venkatesh Babu, and M. Srinivasan, Divide and grow: Capturing huge diversity in crowd images with incrementally growing CNN, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 3618–3626.

DOI

[34]

S. H. Zhang, G. H. Wu, J. P. Costeira, and J. M. F. Moura, FCN-rLSTM: Deep spatio-temporal neural networks for vehicle counting in city cameras, in Proc. 2017 IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 3687–3696.

DOI

[35]

J. J. Tie, X. J. Lei, and Y. Pan, Metabolite-disease association prediction algorithm combining DeepWalk and random forest, Tsinghua Science and Technology, vol. 27, no. 1, pp. 58–67, 2022.

DOI Google Scholar

[36]

Y. H. Yu and J. Li, Residuals-based deep least square support vector machine with redundancy test based model selection to predict time series, Tsinghua Science and Technology, vol. 24, no. 6, pp. 706–715, 2019.

DOI Google Scholar

[37]

J. L. Hu, J. W. Lu, Y. P. Tan, and J. Zhou, Deep transfer metric learning, IEEE Trans. Image Process., vol. 25, no. 12, pp. 5576–5588, 2016.

DOI Google Scholar

[38]

J. L. Hu, J. W. Lu, and Y. P. Tan, Sharable and individual multi-view metric learning, IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 9, pp. 2281–2288, 2018.

DOI Google Scholar

About this article

Publication history

Rights and permissions

Publication history

Received: 03 November 2021

Revised: 22 December 2021

Accepted: 23 December 2021

Published: 17 March 2022

Issue date: October 2022

Copyright

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).