Improved Bag-of-Words Model for Person Re-identification

Lu Tian; Shengjin Wang

doi:10.26599/TST.2018.9010060

Tsinghua Science and Technology 2018, 23(2): 145-156 https://doi.org/10.26599/TST.2018.9010060

Open Access | Issue | Published: 02 April 2018

Improved Bag-of-Words Model for Person Re-identification

Show Author's Information Hide Author's Information Lu Tian, Shengjin Wang(

)

Department of Electronic Engineering, Tsinghua University, Beijing 100084, China.

Keywords:

unsupervised learning, person re-identification, bag-of-words, feature fusion

Cite this article:

Tian L, Wang S. Improved Bag-of-Words Model for Person Re-identification. Tsinghua Science and Technology, 2018, 23(2): 145-156. https://doi.org/10.26599/TST.2018.9010060

Download citation

EndNote(RIS)

BibTeX

464

Views

Downloads

Citations

Crossref

N/A

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

Person re-identification (person re-id) aims to match observations on pedestrians from different cameras. It is a challenging task in real word surveillance systems and draws extensive attention from the community. Most existing methods are based on supervised learning which requires a large number of labeled data. In this paper, we develop a robust unsupervised learning approach for person re-id. We propose an improved Bag-of-Words (iBoW) model to describe and match pedestrians under different camera views. The proposed descriptor does not require any re-id labels, and is robust against pedestrian variations. Experiments show the proposed iBoW descriptor outperforms other unsupervised methods. By combination with efficient metric learning algorithms, we obtained competitive accuracy compared to existing state-of-the-art methods on person re-identification benchmarks, including VIPeR, PRID450S, and Market1501.

Full text

Abstract

Full text

Outline

About this article

Improved Bag-of-Words Model for Person Re-identification

Show Author's information Hide Author's Information Lu Tian, Shengjin Wang(

)

Department of Electronic Engineering, Tsinghua University, Beijing 100084, China.

Abstract

Keywords: unsupervised learning, person re-identification, bag-of-words, feature fusion

References(77)

[1]

Gong

, Cristani

, Yan

, and Loy

C. C.

, Person Re-identification. Springer, 2014.

Google Scholar

[2]

Gray

and Tao

, Viewpoint invariant pedestrian recognition with an ensemble of localized features, in European Conference on Computer Vision. Springer, 2008, pp. 262–275.10.1007/978-3-540-88682-2_21

DOI

[3]

Farenzena M., Bazzani L., Perina A., Murino V., and Cristani M., Person re-identification by symmetry-driven accumulation of local features, in 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 2360–2367.

[4]

, Su

, and Jurie

, Covariance descriptor based on bio-inspired features for person re-identification and face verification, Image and Vision Computing, vol. 32, no. 6, pp. 379–390, 2014.10.1016/j.imavis.2014.04.002

DOI Google Scholar

[5]

Ma B., Su Y., and Jurie F., Local descriptors encoded by fisher vectors for person re-identification, in European Conference on Computer Vision, 2012, pp. 413–422.

DOI

[6]

Zhao R., Ouyang W., and Wang X., Person re-identification by salience matching, in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 2528–2535.

[7]

Tian

and Wang

, Person re-identification as image retrieval using bag of ensemble colors, IEICE TRANSACTIONS on Information and Systems, vol. 98, no. 1, pp. 180–188, 2015.10.1587/transinf.2014EDP7129

DOI Google Scholar

[8]

Zheng L., Wang S., Liu Z., and Tian Q., Packing and padding: Coupled multi-index for accurate image retrieval, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1939–1946.

[9]

Zheng L., Wang S., Wang J., and Tian Q., Accurate image search with multi-scale contextual evidences, International Journal of Computer Vision, pp. 1–13, 2016.

DOI Google Scholar

[10]

Lowe

D. G.

, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.10.1023/B:VISI.0000029664.99615.94

DOI Google Scholar

[11]

Dalal N. and Triggs B., Histograms of oriented gradients for human detection, in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2005, pp. 886–893.

[12]

Berlin B. and Kay P., Basic Color Terms: Their Universality and Evolution. Oakland, CA, USA: Univ. of California Press, 1991.

[13]

Van de Weijer J., Schmid C., and Verbeek J., Learning color names from real-world images, in 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8.

[14]

L. Zheng, Y. Yang, and A. G. Hauptmann, Person re-identification: Past, present and future, arXiv preprint arXiv:1610.02984, 2016.10.1109/TPAMI.2012.120

DOI

[15]

Zheng L., Wang S., Tian L., He F., Liu Z., and Tian Q., Query-adaptive late fusion for image search and person re-identification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1741–1750.

DOI

[16]

Achanta

, Shaji

, Smith

, Lucchi

, Fua

, and Süsstrunk

, Slic superpixels compared to state-of-the-art superpixel methods, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 11, pp. 2274–2282, 2012.10.1007/s11263-012-0594-8

DOI Google Scholar

[17]

Luo P., Wang X., and Tang X., Pedestrian parsing via deep decompositional network, in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 2648–2655.

DOI

[18]

Laptev I., Marszalek M., Schmid C., and Rozenfeld B., Learning realistic human actions from movies, in IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8.

DOI

[19]

Wang X., Wang L., and Qiao Y., A comparative study of encoding, pooling and normalization methods for action recognition, in Asian Conference on Computer Vision, 2012, pp. 572–585.

DOI

[20]

Wang

, Kläser

, Schmid

, and Liu

C.-L.

, Dense trajectories and motion boundary descriptors for action recognition, International Journal of Computer Vision, vol. 103, no. 1, pp. 60–79, 2013.

Google Scholar

[21]

Wang

and Schmid

, Lear-inria submission for the thumos workshop, in ICCV Workshop on Action Recognition with a Large Number of Classes, 2013, p. 8.

[22]

Tang K., Yao B., Fei-Fei L., and Koller D., Combining the right features for complex event recognition, in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 2696–2703.

[23]

Myers

G. K.

, Snoek

C. G.

, Nevatia

, Nallapati

, van Hout

, Pancoast

, Sun

, Habibian

, Koelma

D. C.

, van de Sande

K. E.

, et al., Evaluating multimedia features and fusion for example-based event detection, in Fusion in Computer Vision. Springer, 2014, pp. 109–133.10.1007/978-3-319-05696-8_5

DOI

[24]

, He

, Ke

, and Sun

, Optimized product quantization, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 4, pp. 744–755, 2014.10.1109/TPAMI.2013.240

DOI Google Scholar

[25]

Gray D., Brennan S., and Tao H., Evaluating appearance models for recognition, reacquisition, and tracking, in Proc. IEEE International Workshop on Performance Evaluation for Tracking and Surveillance (PETS), Citeseer, 2007.

[26]

Roth P. M., Hirzer M., Koestinger M., Beleznai C., and Bischof H., Mahalanobis distance learning for person re-identification, in Person Re-Identification, Gong S., Cristani M., Yan S., and Loy C. C., eds. Springer, 2014, pp. 247–267.

DOI

[27]

Zheng L., Shen L., Tian L., Wang S., Wang J., and Tian Q., Scalable person re-identification: A benchmark, in Computer Vision, IEEE International Conference on, 2015.

[28]

Cheng D. S., Cristani M., Stoppa M., Bazzani L., and Murino V., Custom pictorial structures for re-identification, presented at the 22nd British Machine Vision Conference, 2011.

[29]

Zhao R., Ouyang W., and Wang X., Unsupervised salience learning for person re-identification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 3586–3593.

[30]

Liu Y., Shao Y., and Sun F., Person re-identification based on visual saliency, in 2012 12th International Conference on Intelligent Systems Design and Applications (ISDA), 2012, pp. 884–889.

[31]

Liao S., Hu Y., Zhu X., and Li S. Z., Person re-identification by local maximal occurrence representation and metric learning, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2197–2206.

[32]

Zheng W.-S., Gong S., and Xiang T., Person re-identification by probabilistic relative distance comparison, in 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 649–656.

[33]

Köstinger M., Hirzer M., Wohlhart P., Roth P. M., and Bischof H., Large scale metric learning from equivalence constraints, in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, 2012, pp. 2288–2295.

[34]

Hirzer M., Roth P. M., Köstinger M., and Bischof H., Relaxed pairwise learned metric for person re-identification, in European Conference on Computer Vision, 2012, pp. 780–793.

DOI

[35]

Li Z., Chang S., Liang F., Huang T. S., Cao L., and Smith J. R., Learning locally-adaptive decision functions for person verification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 3610–3617.

[36]

Prosser B., Zheng W.-S., Gong S., Xiang T., and Mary Q., Person re-identification by support vector ranking, presented at the 21st British Machine Vision Conference, 2010.

[37]

Li W. and Wang X., Locally aligned feature transforms across views, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 3594–3601.

[38]

Moghaddam

, Jebara

, and Pentland

, Bayesian face recognition, Pattern Recognition, vol. 33, no. 11, pp. 1771–1782, 2000.10.1016/S0031-3203(99)00179-X

DOI Google Scholar

[39]

Scholkopft

and Mullert

K.-R.

, Fisher discriminant analysis with kernels, Neural Networks for Signal Processing IX, vol. 1, no. 1, p. 1, 1999.

Google Scholar

[40]

L. Zhang, T. Xiang, and S. Gong, Learning a discriminative null space for person re-identification, arXiv preprint arXiv:1603.02139, 2016.10.1016/j.patcog.2015.04.005

DOI

[41]

L. Zheng, Y. Yang, and Q. Tian, Sift meets cnn: A decade survey of instance retrieval, arXiv preprint arXiv:1608.01807, 2016.

[42]

Li W., Zhao R., Xiao T., and Wang X., Deepreid: Deep filter pairing neural network for person re-identification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 152–159.

[43]

Ahmed E., Jones M., and Marks T. K., An improved deep learning architecture for person re-identification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3908–3916.

DOI

[44]

D. Yi, Z. Lei, and S. Z. Li, Deep metric learning for practical person re-identification, arXiv preprint arXiv:1407.4979, 2014.10.1007/978-3-540-88682-2_24

DOI

[45]

Ding

, Lin

, Wang

, and Chao

, Deep feature learning with relative distance comparison for person re-identification, Pattern Recognition, vol. 48, no. 10, pp. 2993–3003, 2015.10.1109/MASSP.1984.1162229

DOI Google Scholar

[46]

Jegou H., Douze M., and Schmid C., Hamming embedding and weak geometric consistency for large scale image search, in European Conference on Computer vision, Springer, 2008, pp. 304–317.

DOI

[47]

Sivic J. and Zisserman A., Video google: A text retrieval approach to object matching in videos, in Proceedings of Ninth IEEE International Conference on Computer Vision, 2003, pp. 1470–1477.

[48]

Zheng

, Wang

, and Tian

, Lp-norm idf for scalable image retrieval, IEEE Transactions on Image Processing, vol. 23, no. 8, pp. 3604–3617, 2014.10.1109/MASSP.1984.1162229

DOI Google Scholar

[49]

Jégou H., Douze M., and Schmid C., On the burstiness of visual elements, in Computer Vision and Pattern Recognition, IEEE Conference on CVPR 2009, 2009, pp. 1169–1176.

[50]

Gray

, Vector quantization, IEEE Assp Magazine, vol. 1, no. 2, pp. 4–29, 1984.

Google Scholar

[51]

Fulkerson

, Vedaldi

, and Soatto

, Class segmentation and object localization with superpixel neighborhoods, in ICCV, 2009, vol. 9, pp. 670–677.10.1109/ICCV.2009.5459175

DOI Google Scholar

[52]

Yang Y., Hallman S., Ramanan D., and Fowlkes C., Layered object detection for multi-class segmentation, in 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 3113–3120.

[53]

Gould

, Rodgers

, Cohen

, Elidan

, and Koller

, Multi-class segmentation with relative location prior, International Journal of Computer Vision, vol. 80, no. 3, pp. 300–316, 2008.10.1007/s11263-008-0140-x

DOI Google Scholar

[54]

Zitnick

C. L.

and Kang

S. B.

, Stereo for image-based rendering using image over-segmentation, International Journal of Computer Vision, vol. 75, no. 1, pp. 49–65, 2007.10.1007/s11263-006-0018-8

DOI Google Scholar

[55]

, Sun

, Tang

C.-K.

, and Shum

H.-Y.

, Lazy snapping, in ACM Transactions on Graphics (ToG), vol. 23, no. 3, pp. 303–308, 2004.10.1145/1015706.1015719

DOI Google Scholar

[56]

Mori G., Guiding model search using segmentation, in Tenth IEEE International Conference on Computer Vision (ICCV’05), 2005, pp. 1417–1423.

[57]

Liao S., Zhao G., Kellokumpu V., Pietikäinen M., and Li S. Z., Modeling pixel process with scale invariant local patterns for background subtraction in complex scenes, in 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 1301–1306.

[58]

Ojala

, Pietikäinen

, and Harwood

, A comparative study of texture measures with classification based on featured distributions, Pattern Recognition, vol. 29, no. 1, pp. 51–59, 1996.10.1016/0031-3203(95)00067-4

DOI Google Scholar

[59]

Arandjelović R. and Zisserman A., Three things everyone should know to improve object retrieval, in 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 2911–2918.

[60]

Zheng

W.-S.

, Gong

, and Xiang

, Reidentification by relative distance comparison, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 3, pp. 653–668, 2013.10.1109/TPAMI.2012.138

DOI Google Scholar

[61]

C. Jose and F. Fleuret, Scalable metric learning via weighted approximate rank component analysis, arXiv preprint arXiv:1603.00370, 2016.

[62]

Cheng D., Gong Y., Zhou S., Wang J., and Zheng N., Person re-identification by multi-channel parts-based cnn with improved triplet loss function, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1335–1344.

DOI

[63]

Zhang Y., Li B., Lu H., Irie A., and Ruan X., Sample-specific svm learning for person re-identification, in IEEE Conference on Computer Vision and Pattem Recognition, 2016, pp. 1278–1287.

DOI

[64]

Prates R. F. and Schwartz W. R., Kernel hierarchical pca for person re-identification, in 2016 23rd Intornational Conference on Pattem Recognition (ICPR), 2016.

[65]

Prates R., Oliveira M., and Schwartz W. R., Kernel partial least squares for person re-identification, in 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2016, pp. 249–255.

[66]

Liu X., Wang H., Wu Y., Yang J., and Yang M.-H., An ensemble color model for human re-identification, in 2015 IEEE Winter Conference on Applications of Computer Vision, 2015, pp. 868–875.

DOI

[67]

Yang Y., Yang J., Yan J., Liao S., Yi D., and Li S. Z., Salient color names for person re-identification, in European Conference on Computer Vision, 2014, pp. 536–551.

[68]

de Carvalho Prates

R. F.

and Schwartz

W. R.

, Cbra: Color-based ranking aggregation for person re-identification, in 2015 IEEE International Conference on Image Processing (ICIP), 2015, pp. 1975–1979.10.1007/978-3-319-10584-0_1

DOI

[69]

Xiong F., Gou M., Camps O., and Sznaier M., Person re-identification using kernel-based metric learning methods, in European Conference on Computer Vision, 2014, pp. 1–16.

DOI

[70]

Martinel N., Das A., Micheloni C., and Roy-Chowdhury A. K., Temporal model adaptation for person re-identification, in European Conference on Computer Vision, 2016, pp. 858–877.

DOI

[71]

Chen D., Yuan Z., Chen B., and Zheng N., Similarity learning with spatial constraints for person re-identification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1268–1277.

[72]

L. Wu, C. Shen, and A. van den Hengel, Personnet: Person re-identification with deep convolutional neural networks, arXiv preprint arXiv:1601.07255, 2016.

[73]

H. Liu, J. Feng, M. Qi, J. Jiang, and S. Yan, End-to-end comparative attention networks for person re-identification, arXiv preprint arXiv:1606.04404, 2016.

[74]

C. Su, S. Zhang, J. Xing, W. Gao, and Q. Tian, Deep attributes driven multi-camera person re-identification, arXiv preprint arXiv:1605.03259, 2016.10.1007/978-3-319-46484-8_48

DOI

[75]

Liu J., Zha Z.-J., Tian Q., Liu D., Yao T., Ling Q., and Mei T., Multi-scale triplet cnn for person re-identification, in Proceedings of the 2016 ACM on Multimedia Conference, 2016, pp. 192–196.

[76]

Ustinova E. and Lempitsky V., Learning deep embeddings with histogram loss, in Advances in Neural Information Processing Systems, 2016, pp. 4170–4178.

[77]

Varior R. R., Haloi M., and Wang G., Gated siamese convolutional neural network architecture for human re-identification, in European Conference on Computer Vision, 2016, pp. 791–808.

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 04 December 2016

Revised: 22 January 2017

Accepted: 25 January 2017

Published: 02 April 2018

Issue date: April 2018

Copyright

Acknowledgements

The work was supported by the National Natural Science Foundation of China (No. 61071135) and the National Science and Technology Support Program (No. 2013BAK02B04).