Tian L, Wang S. Improved Bag-of-Words Model for Person Re-identification. Tsinghua Science and Technology, 2018, 23(2): 145-156. https://doi.org/10.26599/TST.2018.9010060
Person re-identification (person re-id) aims to match observations on pedestrians from different cameras. It is a challenging task in real word surveillance systems and draws extensive attention from the community. Most existing methods are based on supervised learning which requires a large number of labeled data. In this paper, we develop a robust unsupervised learning approach for person re-id. We propose an improved Bag-of-Words (iBoW) model to describe and match pedestrians under different camera views. The proposed descriptor does not require any re-id labels, and is robust against pedestrian variations. Experiments show the proposed iBoW descriptor outperforms other unsupervised methods. By combination with efficient metric learning algorithms, we obtained competitive accuracy compared to existing state-of-the-art methods on person re-identification benchmarks, including VIPeR, PRID450S, and Market1501.
Improved Bag-of-Words Model for Person Re-identification
Show Author's information
Hide Author's Information
Lu Tian, Shengjin Wang(
)
Department of Electronic Engineering, Tsinghua University, Beijing100084, China.
Abstract
Person re-identification (person re-id) aims to match observations on pedestrians from different cameras. It is a challenging task in real word surveillance systems and draws extensive attention from the community. Most existing methods are based on supervised learning which requires a large number of labeled data. In this paper, we develop a robust unsupervised learning approach for person re-id. We propose an improved Bag-of-Words (iBoW) model to describe and match pedestrians under different camera views. The proposed descriptor does not require any re-id labels, and is robust against pedestrian variations. Experiments show the proposed iBoW descriptor outperforms other unsupervised methods. By combination with efficient metric learning algorithms, we obtained competitive accuracy compared to existing state-of-the-art methods on person re-identification benchmarks, including VIPeR, PRID450S, and Market1501.
Keywords:unsupervised learning, person re-identification, bag-of-words, feature fusion
References(77)
[1]
GongS., CristaniM., YanS., and LoyC. C., Person Re-identification. , 2014.
GrayD. and TaoH., Viewpoint invariant pedestrian recognition with an ensemble of localized features, in European Conference on Computer Vision. Springer, 2008, pp. 262–275.10.1007/978-3-540-88682-2_21
Farenzena M., Bazzani L., Perina A., Murino V., and Cristani M., Person re-identification by symmetry-driven accumulation of local features, in 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 2360–2367.
[4]
MaB., SuY., and JurieF., Covariance descriptor based on bio-inspired features for person re-identification and face verification, , vol. 32, no. 6, pp. 379–390, 2014.10.1016/j.imavis.2014.04.002
Ma B., Su Y., and Jurie F., Local descriptors encoded by fisher vectors for person re-identification, in European Conference on Computer Vision, 2012, pp. 413–422.
Zhao R., Ouyang W., and Wang X., Person re-identification by salience matching, in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 2528–2535.
[7]
TianL. and WangS., Person re-identification as image retrieval using bag of ensemble colors, , vol. 98, no. 1, pp. 180–188, 2015.10.1587/transinf.2014EDP7129
Zheng L., Wang S., Liu Z., and Tian Q., Packing and padding: Coupled multi-index for accurate image retrieval, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1939–1946.
[9]
Zheng L., Wang S., Wang J., and Tian Q., Accurate image search with multi-scale contextual evidences, International Journal of Computer Vision, pp. 1–13, 2016.
Dalal N. and Triggs B., Histograms of oriented gradients for human detection, in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2005, pp. 886–893.
[12]
Berlin B. and Kay P., Basic Color Terms: Their Universality and Evolution. Oakland, CA, USA: Univ. of California Press, 1991.
[13]
Van de Weijer J., Schmid C., and Verbeek J., Learning color names from real-world images, in 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8.
[14]
L. Zheng, Y. Yang, and A. G. Hauptmann, Person re-identification: Past, present and future, arXiv preprint arXiv:1610.02984, 2016.10.1109/TPAMI.2012.120
Zheng L., Wang S., Tian L., He F., Liu Z., and Tian Q., Query-adaptive late fusion for image search and person re-identification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1741–1750.
Luo P., Wang X., and Tang X., Pedestrian parsing via deep decompositional network, in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 2648–2655.
Laptev I., Marszalek M., Schmid C., and Rozenfeld B., Learning realistic human actions from movies, in IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8.
Wang X., Wang L., and Qiao Y., A comparative study of encoding, pooling and normalization methods for action recognition, in Asian Conference on Computer Vision, 2012, pp. 572–585.
WangH. and SchmidC., Lear-inria submission for the thumos workshop, in ICCV Workshop on Action Recognition with a Large Number of Classes, 2013, p. 8.
[22]
Tang K., Yao B., Fei-Fei L., and Koller D., Combining the right features for complex event recognition, in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 2696–2703.
[23]
MyersG. K., SnoekC. G., NevatiaR., NallapatiR., van HoutJ., PancoastS., SunC., HabibianA., KoelmaD. C., van de SandeK. E., et al., Evaluating multimedia features and fusion for example-based event detection, in Fusion in Computer Vision. Springer, 2014, pp. 109–133.10.1007/978-3-319-05696-8_5
Gray D., Brennan S., and Tao H., Evaluating appearance models for recognition, reacquisition, and tracking, in Proc. IEEE International Workshop on Performance Evaluation for Tracking and Surveillance (PETS), Citeseer, 2007.
[26]
Roth P. M., Hirzer M., Koestinger M., Beleznai C., and Bischof H., Mahalanobis distance learning for person re-identification, in Person Re-Identification, Gong S., Cristani M., Yan S., and Loy C. C., eds. Springer, 2014, pp. 247–267.
Zheng L., Shen L., Tian L., Wang S., Wang J., and Tian Q., Scalable person re-identification: A benchmark, in Computer Vision, IEEE International Conference on, 2015.
[28]
Cheng D. S., Cristani M., Stoppa M., Bazzani L., and Murino V., Custom pictorial structures for re-identification, presented at the 22nd British Machine Vision Conference, 2011.
[29]
Zhao R., Ouyang W., and Wang X., Unsupervised salience learning for person re-identification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 3586–3593.
[30]
Liu Y., Shao Y., and Sun F., Person re-identification based on visual saliency, in 2012 12th International Conference on Intelligent Systems Design and Applications (ISDA), 2012, pp. 884–889.
[31]
Liao S., Hu Y., Zhu X., and Li S. Z., Person re-identification by local maximal occurrence representation and metric learning, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2197–2206.
[32]
Zheng W.-S., Gong S., and Xiang T., Person re-identification by probabilistic relative distance comparison, in 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 649–656.
[33]
Köstinger M., Hirzer M., Wohlhart P., Roth P. M., and Bischof H., Large scale metric learning from equivalence constraints, in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, 2012, pp. 2288–2295.
[34]
Hirzer M., Roth P. M., Köstinger M., and Bischof H., Relaxed pairwise learned metric for person re-identification, in European Conference on Computer Vision, 2012, pp. 780–793.
Li Z., Chang S., Liang F., Huang T. S., Cao L., and Smith J. R., Learning locally-adaptive decision functions for person verification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 3610–3617.
[36]
Prosser B., Zheng W.-S., Gong S., Xiang T., and Mary Q., Person re-identification by support vector ranking, presented at the 21st British Machine Vision Conference, 2010.
[37]
Li W. and Wang X., Locally aligned feature transforms across views, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 3594–3601.
[38]
MoghaddamB., JebaraT., and PentlandA., Bayesian face recognition, , vol. 33, no. 11, pp. 1771–1782, 2000.10.1016/S0031-3203(99)00179-X
L. Zhang, T. Xiang, and S. Gong, Learning a discriminative null space for person re-identification, arXiv preprint arXiv:1603.02139, 2016.10.1016/j.patcog.2015.04.005
L. Zheng, Y. Yang, and Q. Tian, Sift meets cnn: A decade survey of instance retrieval, arXiv preprint arXiv:1608.01807, 2016.
[42]
Li W., Zhao R., Xiao T., and Wang X., Deepreid: Deep filter pairing neural network for person re-identification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 152–159.
[43]
Ahmed E., Jones M., and Marks T. K., An improved deep learning architecture for person re-identification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3908–3916.
D. Yi, Z. Lei, and S. Z. Li, Deep metric learning for practical person re-identification, arXiv preprint arXiv:1407.4979, 2014.10.1007/978-3-540-88682-2_24
DingS., LinL., WangG., and ChaoH., Deep feature learning with relative distance comparison for person re-identification, , vol. 48, no. 10, pp. 2993–3003, 2015.10.1109/MASSP.1984.1162229
Jegou H., Douze M., and Schmid C., Hamming embedding and weak geometric consistency for large scale image search, in European Conference on Computer vision, Springer, 2008, pp. 304–317.
Sivic J. and Zisserman A., Video google: A text retrieval approach to object matching in videos, in Proceedings of Ninth IEEE International Conference on Computer Vision, 2003, pp. 1470–1477.
[48]
ZhengL., WangS., and TianQ., Lp-norm idf for scalable image retrieval, , vol. 23, no. 8, pp. 3604–3617, 2014.10.1109/MASSP.1984.1162229
Jégou H., Douze M., and Schmid C., On the burstiness of visual elements, in Computer Vision and Pattern Recognition, IEEE Conference on CVPR 2009, 2009, pp. 1169–1176.
FulkersonB., VedaldiA., and SoattoS., Class segmentation and object localization with superpixel neighborhoods, , 2009, vol. 9, pp. 670–677.10.1109/ICCV.2009.5459175
Yang Y., Hallman S., Ramanan D., and Fowlkes C., Layered object detection for multi-class segmentation, in 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 3113–3120.
[53]
GouldS., RodgersJ., CohenD., ElidanG., and KollerD., Multi-class segmentation with relative location prior, , vol. 80, no. 3, pp. 300–316, 2008.10.1007/s11263-008-0140-x
ZitnickC. L. and KangS. B., Stereo for image-based rendering using image over-segmentation, , vol. 75, no. 1, pp. 49–65, 2007.10.1007/s11263-006-0018-8
Mori G., Guiding model search using segmentation, in Tenth IEEE International Conference on Computer Vision (ICCV’05), 2005, pp. 1417–1423.
[57]
Liao S., Zhao G., Kellokumpu V., Pietikäinen M., and Li S. Z., Modeling pixel process with scale invariant local patterns for background subtraction in complex scenes, in 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 1301–1306.
[58]
OjalaT., PietikäinenM., and HarwoodD., A comparative study of texture measures with classification based on featured distributions, , vol. 29, no. 1, pp. 51–59, 1996.10.1016/0031-3203(95)00067-4
Arandjelović R. and Zisserman A., Three things everyone should know to improve object retrieval, in 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 2911–2918.
[60]
ZhengW.-S., GongS., and XiangT., Reidentification by relative distance comparison, , vol. 35, no. 3, pp. 653–668, 2013.10.1109/TPAMI.2012.138
C. Jose and F. Fleuret, Scalable metric learning via weighted approximate rank component analysis, arXiv preprint arXiv:1603.00370, 2016.
[62]
Cheng D., Gong Y., Zhou S., Wang J., and Zheng N., Person re-identification by multi-channel parts-based cnn with improved triplet loss function, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1335–1344.
Zhang Y., Li B., Lu H., Irie A., and Ruan X., Sample-specific svm learning for person re-identification, in IEEE Conference on Computer Vision and Pattem Recognition, 2016, pp. 1278–1287.
Prates R. F. and Schwartz W. R., Kernel hierarchical pca for person re-identification, in 2016 23rd Intornational Conference on Pattem Recognition (ICPR), 2016.
[65]
Prates R., Oliveira M., and Schwartz W. R., Kernel partial least squares for person re-identification, in 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2016, pp. 249–255.
[66]
Liu X., Wang H., Wu Y., Yang J., and Yang M.-H., An ensemble color model for human re-identification, in 2015 IEEE Winter Conference on Applications of Computer Vision, 2015, pp. 868–875.
Yang Y., Yang J., Yan J., Liao S., Yi D., and Li S. Z., Salient color names for person re-identification, in European Conference on Computer Vision, 2014, pp. 536–551.
[68]
de Carvalho PratesR. F. and SchwartzW. R., Cbra: Color-based ranking aggregation for person re-identification, in 2015 IEEE International Conference on Image Processing (ICIP), 2015, pp. 1975–1979.10.1007/978-3-319-10584-0_1
Xiong F., Gou M., Camps O., and Sznaier M., Person re-identification using kernel-based metric learning methods, in European Conference on Computer Vision, 2014, pp. 1–16.
Martinel N., Das A., Micheloni C., and Roy-Chowdhury A. K., Temporal model adaptation for person re-identification, in European Conference on Computer Vision, 2016, pp. 858–877.
Chen D., Yuan Z., Chen B., and Zheng N., Similarity learning with spatial constraints for person re-identification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1268–1277.
[72]
L. Wu, C. Shen, and A. van den Hengel, Personnet: Person re-identification with deep convolutional neural networks, arXiv preprint arXiv:1601.07255, 2016.
[73]
H. Liu, J. Feng, M. Qi, J. Jiang, and S. Yan, End-to-end comparative attention networks for person re-identification, arXiv preprint arXiv:1606.04404, 2016.
[74]
C. Su, S. Zhang, J. Xing, W. Gao, and Q. Tian, Deep attributes driven multi-camera person re-identification, arXiv preprint arXiv:1605.03259, 2016.10.1007/978-3-319-46484-8_48
Liu J., Zha Z.-J., Tian Q., Liu D., Yao T., Ling Q., and Mei T., Multi-scale triplet cnn for person re-identification, in Proceedings of the 2016 ACM on Multimedia Conference, 2016, pp. 192–196.
[76]
Ustinova E. and Lempitsky V., Learning deep embeddings with histogram loss, in Advances in Neural Information Processing Systems, 2016, pp. 4170–4178.
[77]
Varior R. R., Haloi M., and Wang G., Gated siamese convolutional neural network architecture for human re-identification, in European Conference on Computer Vision, 2016, pp. 791–808.
The work was supported by the National Natural Science Foundation of China (No. 61071135) and the National Science and Technology Support Program (No. 2013BAK02B04).