Journal Home > Volume 3 , Issue 1

To protect consumers and those who manufacture and sell the products they enjoy, it is important to develop convenient tools to help consumers distinguish an authentic product from a counterfeit one. The advancement of deep learning techniques for fine-grained object recognition creates new possibilities for genuine product identification. In this paper, we develop a Semi-Supervised Attention (SSA) model to work in conjunction with a large-scale multiple-source dataset named YSneaker, which consists of sneakers from various brands and their authentication results, to identify authentic sneakers. Specifically, the SSA model has a self-attention structure for different images of a labeled sneaker and a novel prototypical loss is designed to exploit unlabeled data within the data structure. The model draws on the weighted average of the output feature representations, where the weights are determined by an additional shallow neural network. This allows the SSA model to focus on the most important images of a sneaker for use in identification. A unique feature of the SSA model is its ability to take advantage of unlabeled data, which can help to further minimize the intra-class variation for more discriminative feature embedding. To validate the model, we collect a large number of labeled and unlabeled sneaker images and perform extensive experimental studies. The results show that YSneaker together with the proposed SSA architecture can identify authentic sneakers with a high accuracy rate.


menu
Abstract
Full text
Outline
About this article

A Semi-Supervised Attention Model for Identifying Authentic Sneakers

Show Author's information Yang YangNengjun ZhuYifeng WuJian CaoDechuan Zhan( )Hui Xiong( )
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China.
Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China.
Alibaba Company, Hangzhou 310000, China.
Rutgers University, New York, NJ 07102, USA.

Abstract

To protect consumers and those who manufacture and sell the products they enjoy, it is important to develop convenient tools to help consumers distinguish an authentic product from a counterfeit one. The advancement of deep learning techniques for fine-grained object recognition creates new possibilities for genuine product identification. In this paper, we develop a Semi-Supervised Attention (SSA) model to work in conjunction with a large-scale multiple-source dataset named YSneaker, which consists of sneakers from various brands and their authentication results, to identify authentic sneakers. Specifically, the SSA model has a self-attention structure for different images of a labeled sneaker and a novel prototypical loss is designed to exploit unlabeled data within the data structure. The model draws on the weighted average of the output feature representations, where the weights are determined by an additional shallow neural network. This allows the SSA model to focus on the most important images of a sneaker for use in identification. A unique feature of the SSA model is its ability to take advantage of unlabeled data, which can help to further minimize the intra-class variation for more discriminative feature embedding. To validate the model, we collect a large number of labeled and unlabeled sneaker images and perform extensive experimental studies. The results show that YSneaker together with the proposed SSA architecture can identify authentic sneakers with a high accuracy rate.

Keywords: fine-grained classification, attention mechanism, sneaker identification, multi-instance learning

References(32)

[1]
G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, Densely connected convolutional networks, in Proceedings of the Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 2261-2269.
DOI
[2]
T. Lin, P. Goyal, R. B. Girshick, K. He, and P. Dollar, Focal loss for dense object detection, in Proceedings of the International Conference on Computer Vision, Venice, Italy, 2017, pp. 2999-3007.
DOI
[3]
K. He, G. Gkioxari, P. Dollar, and R. B. Girshick, Mask R-CNN, in Proceedings of the International Conference on Computer Vision, Venice, Italy, 2017, pp. 2980-2988.
DOI
[4]
O. Ronneberger, P. Fischer, and T. Brox, U-net: Convolutional networks for biomedical image segmentation, in Proc. Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 2015, pp. 234-241.
DOI
[5]
J. Lian, X. Zhou, F. Zhang, Z. Chen, X. Xie, and G. Sun, xdeepfm: Combining explicit and implicit feature interactions for recommender systems, in Proceedings of the International Conference on Knowledge Discovery and Data Mining, London, UK, 2018, pp. 1754-1763.
DOI
[6]
S. Wang, L. He, B. Cao, C. Lu, P. S. Yu, and A. B. Ragin, Structural deep brain network mining, in Proceedings of the International Conference on Knowledge Discovery and Data Mining, Halifax, Canada, 2017, pp. 475-484.
DOI
[7]
H. Xu, Z. Yu, J. Yang, H. Xiong, and H. Zhu, Dynamic talent flow analysis with deep sequence prediction modeling, Transactions on Knowledge and Data Engineering, vol. 31, no. 10, pp. 1926-1939, 2019.
[8]
D. P. Kingma and M. Welling, Auto-encoding variational bayes, in Proceedings of the International Conference on Learning Representations, Banff, Canada, 2014, pp. 34-42.
[9]
Y. Li and J. Ye, Learning adversarial networks for semi-supervised text classification via policy gradient, in Proceedings of the International Conference on Knowledge Discovery and Data Mining, London, UK, 2018, pp. 1715-1723.
DOI
[10]
K. G. Dizaji, X. Wang, and H. Huang, Semi-supervised generative adversarial network for gene expression inference, in Proceedings of the International Conference on Knowledge Discovery and Data Mining, London, UK, 2018, pp. 1435-1444.
[11]
T. Lin, A. Roy Chowdhury, and S. Maji, Bilinear CNN models for fine-grained visual recognition, in Proceedings of the International Conference on Computer Vision, Santiago, Chile, 2015, pp. 1449-1457.
DOI
[12]
H. Zheng, J. Fu, T. Mei, and J. Luo, Learning multi-attention convolutional neural network for fine-grained image recognition, in Proceedings of the International Conference on Computer Vision, Venice, Italy, 2017, pp. 5219-5227.
DOI
[13]
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, The Caltech-UCSD Birds-200-2011 Dataset, Report, California Institute of Technology, CA, USA, 2011.
[14]
X. Zhang, H. Xiong, W. Zhou, W. Lin, and Q. Tian, Picking deep filter responses for fine-grained image recognition, in Proceedings of the Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 1134-1142.
DOI
[15]
A. Khosla, N. Jayadevaprakash, B. Yao, and F.-F. Li, Novel dataset for fine-grained image categorization: Stanford dogs, in Proceedings of the Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 2011, p. 1.
[16]
J. Krause, H. Jin, J. Yang, and F. Li, Fine-grained recognition without part annotations, in Proceedings of the Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 5546-5555.
DOI
[17]
Z.-H. Zhou, Abductive learning: Towards bridging machine learning and logical reasoning, Science China Information Sciences, vol. 62, no. 7, pp. 76 101:1-76 101:3, 2019.
[18]
M. Ilse, J. M. Tomczak, and M. Welling, Attention-based deep multiple instance learning, in Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 2018, pp. 2132-2141.
[19]
M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola, Deep sets, in Proc. of Advances in Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 3394-3404.
[20]
Y. Gao, O. Beijbom, N. Zhang, and T. Darrell, Compact bilinear pooling, in Proceedings of the International Conference on Computer Vision, Las Vegas, NV, USA, 2016, pp. 317-326.
DOI
[21]
N. Zhang, J. Donahue, R. B. Girshick, and T. Darrell, Part-based R-CNNs for fine-grained category detection, in Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 2014, pp. 834-849.
DOI
[22]
F. Perronnin and D. Larlus, Fisher vectors meet neural networks: A hybrid classification architecture, in Proceedings of the Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 3743-3752.
DOI
[23]
J. Fu, H. Zheng, and T. Mei, Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition, in Proceedings of the Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 4476-4484.
DOI
[24]
P. H. O. Pinheiro and R. Collobert, From image-level to pixel-level labeling with convolutional networks, in Proceedings of the Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 1713-1721.
DOI
[25]
J. Feng and Z. Zhou, Deep MIML network, in Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 2017, pp. 1884-1890.
[26]
Y. Yang, Y. Wu, D. Zhan, Z. Liu, and Y. Jiang, Complex object classification: A multi-modal multi-instance multi-label deep network with optimal transport, in Proceedings of the International Conference on Knowledge Discovery and Data Mining, London, UK, 2018, pp. 2594-2603.
DOI
[27]
K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio, Show, attend and tell: Neural image caption generation with visual attention, in Proceedings of the International Conference on Machine Learning, Lille, France, 2015, pp. 2048-2057.
[28]
H. Li, M. R. Min, Y. Ge, and A. Kadav, A context-aware attention network for interactive question answering, in Proceedings of the International Conference on Knowledge Discovery and Data Mining, Halifax, Canada, 2017, pp. 927-935.
DOI
[29]
N. Pappas and A. Popescu-Belis, Explaining the stars: Weighted multiple-instance learning for aspect-based sentiment analysis, in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 2014, pp. 455-466.
DOI
[30]
K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proceedings of the Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770-778.
DOI
[31]
X. Wang, Y. Yan, P. Tang, X. Bai, and W. Liu, Revisiting multiple instance neural networks, Pattern Recognition, vol. 74, pp. 15-24, 2018.
[32]
L. v. d. Maaten and G. Hinton, Visualizing data using t-SNE, Journal of Machine Learning Research, vol. 9, no. 11, pp. 2579-2605, 2008.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 21 May 2019
Accepted: 25 September 2019
Published: 19 December 2019
Issue date: March 2020

Copyright

© The author(s) 2020

Acknowledgements

This research was supported by the National Key R&D Program of China (No. 2018YFB1004300), the National Natural Science Foundation of China (Nos. 61773198, 61632004, and 61751306), the National Natural Science Foundation of China-Korea Research Foundation Joint Research Project (No. 61861146001), Collaborative Innovation Center of Novel Software Technology and Industrialization, and Postgraduate Research & Practice Innovation Program of Jiangsu Province (No. KYCX18-0045).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return