Journal Home > Volume 23 , Issue 2

Person re-identification (person re-id) aims to match observations on pedestrians from different cameras. It is a challenging task in real word surveillance systems and draws extensive attention from the community. Most existing methods are based on supervised learning which requires a large number of labeled data. In this paper, we develop a robust unsupervised learning approach for person re-id. We propose an improved Bag-of-Words (iBoW) model to describe and match pedestrians under different camera views. The proposed descriptor does not require any re-id labels, and is robust against pedestrian variations. Experiments show the proposed iBoW descriptor outperforms other unsupervised methods. By combination with efficient metric learning algorithms, we obtained competitive accuracy compared to existing state-of-the-art methods on person re-identification benchmarks, including VIPeR, PRID450S, and Market1501.


menu
Abstract
Full text
Outline
About this article

Improved Bag-of-Words Model for Person Re-identification

Show Author's information Lu TianShengjin Wang( )
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China.

Abstract

Person re-identification (person re-id) aims to match observations on pedestrians from different cameras. It is a challenging task in real word surveillance systems and draws extensive attention from the community. Most existing methods are based on supervised learning which requires a large number of labeled data. In this paper, we develop a robust unsupervised learning approach for person re-id. We propose an improved Bag-of-Words (iBoW) model to describe and match pedestrians under different camera views. The proposed descriptor does not require any re-id labels, and is robust against pedestrian variations. Experiments show the proposed iBoW descriptor outperforms other unsupervised methods. By combination with efficient metric learning algorithms, we obtained competitive accuracy compared to existing state-of-the-art methods on person re-identification benchmarks, including VIPeR, PRID450S, and Market1501.

Keywords: unsupervised learning, person re-identification, bag-of-words, feature fusion

References(77)

[1]
Gong S., Cristani M., Yan S., and Loy C. C., Person Re-identificationSpringer, 2014.
[2]
Gray D. and Tao H., Viewpoint invariant pedestrian recognition with an ensemble of localized features, in European Conference on Computer Vision. Springer, 2008, pp. 262275.10.1007/978-3-540-88682-2_21
DOI
[3]
Farenzena M., Bazzani L., Perina A., Murino V., and Cristani M., Person re-identification by symmetry-driven accumulation of local features, in 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 2360–2367.
[4]
Ma B., Su Y., and Jurie F., Covariance descriptor based on bio-inspired features for person re-identification and face verification, Image and Vision Computing, vol. 32, no. 6, pp. 379390, 2014.10.1016/j.imavis.2014.04.002
[5]
Ma B., Su Y., and Jurie F., Local descriptors encoded by fisher vectors for person re-identification, in European Conference on Computer Vision, 2012, pp. 413–422.
DOI
[6]
Zhao R., Ouyang W., and Wang X., Person re-identification by salience matching, in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 2528–2535.
[7]
Tian L. and Wang S., Person re-identification as image retrieval using bag of ensemble colors, IEICE TRANSACTIONS on Information and Systems, vol. 98, no. 1, pp. 180188, 2015.10.1587/transinf.2014EDP7129
[8]
Zheng L., Wang S., Liu Z., and Tian Q., Packing and padding: Coupled multi-index for accurate image retrieval, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1939–1946.
[9]
Zheng L., Wang S., Wang J., and Tian Q., Accurate image search with multi-scale contextual evidences, International Journal of Computer Vision, pp. 1–13, 2016.
[10]
Lowe D. G., Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, vol. 60, no. 2, pp. 91110, 2004.10.1023/B:VISI.0000029664.99615.94
[11]
Dalal N. and Triggs B., Histograms of oriented gradients for human detection, in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2005, pp. 886–893.
[12]
Berlin B. and Kay P., Basic Color Terms: Their Universality and Evolution. Oakland, CA, USA: Univ. of California Press, 1991.
[13]
Van de Weijer J., Schmid C., and Verbeek J., Learning color names from real-world images, in 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8.
[14]
L. Zheng, Y. Yang, and A. G. Hauptmann, Person re-identification: Past, present and future, arXiv preprint arXiv:1610.02984, 2016.10.1109/TPAMI.2012.120
DOI
[15]
Zheng L., Wang S., Tian L., He F., Liu Z., and Tian Q., Query-adaptive late fusion for image search and person re-identification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1741–1750.
DOI
[16]
Achanta R., Shaji A., Smith K., Lucchi A., Fua P., and Süsstrunk S., Slic superpixels compared to state-of-the-art superpixel methods, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 11, pp. 22742282, 2012.10.1007/s11263-012-0594-8
[17]
Luo P., Wang X., and Tang X., Pedestrian parsing via deep decompositional network, in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 2648–2655.
DOI
[18]
Laptev I., Marszalek M., Schmid C., and Rozenfeld B., Learning realistic human actions from movies, in IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8.
DOI
[19]
Wang X., Wang L., and Qiao Y., A comparative study of encoding, pooling and normalization methods for action recognition, in Asian Conference on Computer Vision, 2012, pp. 572–585.
DOI
[20]
Wang H., Kläser A., Schmid C., and Liu C.-L., Dense trajectories and motion boundary descriptors for action recognition, International Journal of Computer Vision, vol. 103, no. 1, pp. 6079, 2013.
[21]
Wang H. and Schmid C., Lear-inria submission for the thumos workshop, in ICCV Workshop on Action Recognition with a Large Number of Classes, 2013, p. 8.
[22]
Tang K., Yao B., Fei-Fei L., and Koller D., Combining the right features for complex event recognition, in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 2696–2703.
[23]
Myers G. K., Snoek C. G., Nevatia R., Nallapati R., van Hout J., Pancoast S., Sun C., Habibian A., Koelma D. C., van de Sande K. E., et al., Evaluating multimedia features and fusion for example-based event detection, in Fusion in Computer Vision. Springer, 2014, pp. 109133.10.1007/978-3-319-05696-8_5
DOI
[24]
Ge T., He K., Ke Q., and Sun J., Optimized product quantization, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 4, pp. 744755, 2014.10.1109/TPAMI.2013.240
[25]
Gray D., Brennan S., and Tao H., Evaluating appearance models for recognition, reacquisition, and tracking, in Proc. IEEE International Workshop on Performance Evaluation for Tracking and Surveillance (PETS), Citeseer, 2007.
[26]
Roth P. M., Hirzer M., Koestinger M., Beleznai C., and Bischof H., Mahalanobis distance learning for person re-identification, in Person Re-Identification, Gong S., Cristani M., Yan S., and Loy C. C., eds. Springer, 2014, pp. 247–267.
DOI
[27]
Zheng L., Shen L., Tian L., Wang S., Wang J., and Tian Q., Scalable person re-identification: A benchmark, in Computer Vision, IEEE International Conference on, 2015.
[28]
Cheng D. S., Cristani M., Stoppa M., Bazzani L., and Murino V., Custom pictorial structures for re-identification, presented at the 22nd British Machine Vision Conference, 2011.
[29]
Zhao R., Ouyang W., and Wang X., Unsupervised salience learning for person re-identification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 3586–3593.
[30]
Liu Y., Shao Y., and Sun F., Person re-identification based on visual saliency, in 2012 12th International Conference on Intelligent Systems Design and Applications (ISDA), 2012, pp. 884–889.
[31]
Liao S., Hu Y., Zhu X., and Li S. Z., Person re-identification by local maximal occurrence representation and metric learning, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2197–2206.
[32]
Zheng W.-S., Gong S., and Xiang T., Person re-identification by probabilistic relative distance comparison, in 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 649–656.
[33]
Köstinger M., Hirzer M., Wohlhart P., Roth P. M., and Bischof H., Large scale metric learning from equivalence constraints, in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, 2012, pp. 2288–2295.
[34]
Hirzer M., Roth P. M., Köstinger M., and Bischof H., Relaxed pairwise learned metric for person re-identification, in European Conference on Computer Vision, 2012, pp. 780–793.
DOI
[35]
Li Z., Chang S., Liang F., Huang T. S., Cao L., and Smith J. R., Learning locally-adaptive decision functions for person verification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 3610–3617.
[36]
Prosser B., Zheng W.-S., Gong S., Xiang T., and Mary Q., Person re-identification by support vector ranking, presented at the 21st British Machine Vision Conference, 2010.
[37]
Li W. and Wang X., Locally aligned feature transforms across views, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 3594–3601.
[38]
Moghaddam B., Jebara T., and Pentland A., Bayesian face recognition, Pattern Recognition, vol. 33, no. 11, pp. 17711782, 2000.10.1016/S0031-3203(99)00179-X
[39]
Scholkopft B. and Mullert K.-R., Fisher discriminant analysis with kernels, Neural Networks for Signal Processing IX, vol. 1, no. 1, p. 1, 1999.
[40]
L. Zhang, T. Xiang, and S. Gong, Learning a discriminative null space for person re-identification, arXiv preprint arXiv:1603.02139, 2016.10.1016/j.patcog.2015.04.005
DOI
[41]
L. Zheng, Y. Yang, and Q. Tian, Sift meets cnn: A decade survey of instance retrieval, arXiv preprint arXiv:1608.01807, 2016.
[42]
Li W., Zhao R., Xiao T., and Wang X., Deepreid: Deep filter pairing neural network for person re-identification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 152–159.
[43]
Ahmed E., Jones M., and Marks T. K., An improved deep learning architecture for person re-identification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3908–3916.
DOI
[44]
D. Yi, Z. Lei, and S. Z. Li, Deep metric learning for practical person re-identification, arXiv preprint arXiv:1407.4979, 2014.10.1007/978-3-540-88682-2_24
DOI
[45]
Ding S., Lin L., Wang G., and Chao H., Deep feature learning with relative distance comparison for person re-identification, Pattern Recognition, vol. 48, no. 10, pp. 29933003, 2015.10.1109/MASSP.1984.1162229
[46]
Jegou H., Douze M., and Schmid C., Hamming embedding and weak geometric consistency for large scale image search, in European Conference on Computer vision, Springer, 2008, pp. 304–317.
DOI
[47]
Sivic J. and Zisserman A., Video google: A text retrieval approach to object matching in videos, in Proceedings of Ninth IEEE International Conference on Computer Vision, 2003, pp. 1470–1477.
[48]
Zheng L., Wang S., and Tian Q., Lp-norm idf for scalable image retrieval, IEEE Transactions on Image Processing, vol. 23, no. 8, pp. 36043617, 2014.10.1109/MASSP.1984.1162229
[49]
Jégou H., Douze M., and Schmid C., On the burstiness of visual elements, in Computer Vision and Pattern Recognition, IEEE Conference on CVPR 2009, 2009, pp. 1169–1176.
[50]
Gray R., Vector quantization, IEEE Assp Magazine, vol. 1, no. 2, pp. 429, 1984.
[51]
Fulkerson B., Vedaldi A., and Soatto S., Class segmentation and object localization with superpixel neighborhoods, in ICCV, 2009, vol. 9, pp. 670677.10.1109/ICCV.2009.5459175
[52]
Yang Y., Hallman S., Ramanan D., and Fowlkes C., Layered object detection for multi-class segmentation, in 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 3113–3120.
[53]
Gould S., Rodgers J., Cohen D., Elidan G., and Koller D., Multi-class segmentation with relative location prior, International Journal of Computer Vision, vol. 80, no. 3, pp. 300316, 2008.10.1007/s11263-008-0140-x
[54]
Zitnick C. L. and Kang S. B., Stereo for image-based rendering using image over-segmentation, International Journal of Computer Vision, vol. 75, no. 1, pp. 4965, 2007.10.1007/s11263-006-0018-8
[55]
Li Y., Sun J., Tang C.-K., and Shum H.-Y., Lazy snapping, in ACM Transactions on Graphics (ToG), vol. 23, no. 3, pp. 303308, 2004.10.1145/1015706.1015719
[56]
Mori G., Guiding model search using segmentation, in Tenth IEEE International Conference on Computer Vision (ICCV’05), 2005, pp. 1417–1423.
[57]
Liao S., Zhao G., Kellokumpu V., Pietikäinen M., and Li S. Z., Modeling pixel process with scale invariant local patterns for background subtraction in complex scenes, in 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 1301–1306.
[58]
Ojala T., Pietikäinen M., and Harwood D., A comparative study of texture measures with classification based on featured distributions, Pattern Recognition, vol. 29, no. 1, pp. 5159, 1996.10.1016/0031-3203(95)00067-4
[59]
Arandjelović R. and Zisserman A., Three things everyone should know to improve object retrieval, in 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 2911–2918.
[60]
Zheng W.-S., Gong S., and Xiang T., Reidentification by relative distance comparison, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 3, pp. 653668, 2013.10.1109/TPAMI.2012.138
[61]
C. Jose and F. Fleuret, Scalable metric learning via weighted approximate rank component analysis, arXiv preprint arXiv:1603.00370, 2016.
[62]
Cheng D., Gong Y., Zhou S., Wang J., and Zheng N., Person re-identification by multi-channel parts-based cnn with improved triplet loss function, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1335–1344.
DOI
[63]
Zhang Y., Li B., Lu H., Irie A., and Ruan X., Sample-specific svm learning for person re-identification, in IEEE Conference on Computer Vision and Pattem Recognition, 2016, pp. 1278–1287.
DOI
[64]
Prates R. F. and Schwartz W. R., Kernel hierarchical pca for person re-identification, in 2016 23rd Intornational Conference on Pattem Recognition (ICPR), 2016.
[65]
Prates R., Oliveira M., and Schwartz W. R., Kernel partial least squares for person re-identification, in 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2016, pp. 249–255.
[66]
Liu X., Wang H., Wu Y., Yang J., and Yang M.-H., An ensemble color model for human re-identification, in 2015 IEEE Winter Conference on Applications of Computer Vision, 2015, pp. 868–875.
DOI
[67]
Yang Y., Yang J., Yan J., Liao S., Yi D., and Li S. Z., Salient color names for person re-identification, in European Conference on Computer Vision, 2014, pp. 536–551.
[68]
de Carvalho Prates R. F. and Schwartz W. R., Cbra: Color-based ranking aggregation for person re-identification, in 2015 IEEE International Conference on Image Processing (ICIP), 2015, pp. 19751979.10.1007/978-3-319-10584-0_1
DOI
[69]
Xiong F., Gou M., Camps O., and Sznaier M., Person re-identification using kernel-based metric learning methods, in European Conference on Computer Vision, 2014, pp. 1–16.
DOI
[70]
Martinel N., Das A., Micheloni C., and Roy-Chowdhury A. K., Temporal model adaptation for person re-identification, in European Conference on Computer Vision, 2016, pp. 858–877.
DOI
[71]
Chen D., Yuan Z., Chen B., and Zheng N., Similarity learning with spatial constraints for person re-identification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1268–1277.
[72]
L. Wu, C. Shen, and A. van den Hengel, Personnet: Person re-identification with deep convolutional neural networks, arXiv preprint arXiv:1601.07255, 2016.
[73]
H. Liu, J. Feng, M. Qi, J. Jiang, and S. Yan, End-to-end comparative attention networks for person re-identification, arXiv preprint arXiv:1606.04404, 2016.
[74]
C. Su, S. Zhang, J. Xing, W. Gao, and Q. Tian, Deep attributes driven multi-camera person re-identification, arXiv preprint arXiv:1605.03259, 2016.10.1007/978-3-319-46484-8_48
DOI
[75]
Liu J., Zha Z.-J., Tian Q., Liu D., Yao T., Ling Q., and Mei T., Multi-scale triplet cnn for person re-identification, in Proceedings of the 2016 ACM on Multimedia Conference, 2016, pp. 192–196.
[76]
Ustinova E. and Lempitsky V., Learning deep embeddings with histogram loss, in Advances in Neural Information Processing Systems, 2016, pp. 4170–4178.
[77]
Varior R. R., Haloi M., and Wang G., Gated siamese convolutional neural network architecture for human re-identification, in European Conference on Computer Vision, 2016, pp. 791–808.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 04 December 2016
Revised: 22 January 2017
Accepted: 25 January 2017
Published: 02 April 2018
Issue date: April 2018

Copyright

© The author(s) 2018

Acknowledgements

The work was supported by the National Natural Science Foundation of China (No. 61071135) and the National Science and Technology Support Program (No. 2013BAK02B04).

Rights and permissions

Return