A Semi-Supervised Attention Model for Identifying Authentic Sneakers

Yang Yang; Nengjun Zhu; Yifeng Wu; Jian Cao; Dechuan Zhan; Hui Xiong

doi:10.26599/BDMA.2019.9020017

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (11.4 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

A Semi-Supervised Attention Model for Identifying Authentic Sneakers

Yang Yang, Nengjun Zhu, Yifeng Wu, Jian Cao, Dechuan Zhan(

), Hui Xiong(

)

∙ National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China.

∙ Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China.

∙ Alibaba Company, Hangzhou 310000, China.

∙ Rutgers University, New York, NJ 07102, USA.

Show Author Information

Abstract

To protect consumers and those who manufacture and sell the products they enjoy, it is important to develop convenient tools to help consumers distinguish an authentic product from a counterfeit one. The advancement of deep learning techniques for fine-grained object recognition creates new possibilities for genuine product identification. In this paper, we develop a Semi-Supervised Attention (SSA) model to work in conjunction with a large-scale multiple-source dataset named YSneaker, which consists of sneakers from various brands and their authentication results, to identify authentic sneakers. Specifically, the SSA model has a self-attention structure for different images of a labeled sneaker and a novel prototypical loss is designed to exploit unlabeled data within the data structure. The model draws on the weighted average of the output feature representations, where the weights are determined by an additional shallow neural network. This allows the SSA model to focus on the most important images of a sneaker for use in identification. A unique feature of the SSA model is its ability to take advantage of unlabeled data, which can help to further minimize the intra-class variation for more discriminative feature embedding. To validate the model, we collect a large number of labeled and unlabeled sneaker images and perform extensive experimental studies. The results show that YSneaker together with the proposed SSA architecture can identify authentic sneakers with a high accuracy rate.

Keywords

fine-grained classification attention mechanism sneaker identification multi-instance learning

References

[1]

G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, Densely connected convolutional networks, in Proceedings of the Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 2261-2269.

Crossref

[2]

T. Lin, P. Goyal, R. B. Girshick, K. He, and P. Dollar, Focal loss for dense object detection, in Proceedings of the International Conference on Computer Vision, Venice, Italy, 2017, pp. 2999-3007.

Crossref

[3]

K. He, G. Gkioxari, P. Dollar, and R. B. Girshick, Mask R-CNN, in Proceedings of the International Conference on Computer Vision, Venice, Italy, 2017, pp. 2980-2988.

Crossref

[4]

O. Ronneberger, P. Fischer, and T. Brox, U-net: Convolutional networks for biomedical image segmentation, in Proc. Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 2015, pp. 234-241.

Crossref

[5]

J. Lian, X. Zhou, F. Zhang, Z. Chen, X. Xie, and G. Sun, xdeepfm: Combining explicit and implicit feature interactions for recommender systems, in Proceedings of the International Conference on Knowledge Discovery and Data Mining, London, UK, 2018, pp. 1754-1763.

Crossref

[6]

S. Wang, L. He, B. Cao, C. Lu, P. S. Yu, and A. B. Ragin, Structural deep brain network mining, in Proceedings of the International Conference on Knowledge Discovery and Data Mining, Halifax, Canada, 2017, pp. 475-484.

Crossref

[7]

H. Xu, Z. Yu, J. Yang, H. Xiong, and H. Zhu, Dynamic talent flow analysis with deep sequence prediction modeling, Transactions on Knowledge and Data Engineering, vol. 31, no. 10, pp. 1926-1939, 2019.

Crossref Google Scholar

[8]

D. P. Kingma and M. Welling, Auto-encoding variational bayes, in Proceedings of the International Conference on Learning Representations, Banff, Canada, 2014, pp. 34-42.

[9]

Y. Li and J. Ye, Learning adversarial networks for semi-supervised text classification via policy gradient, in Proceedings of the International Conference on Knowledge Discovery and Data Mining, London, UK, 2018, pp. 1715-1723.

Crossref

[10]

K. G. Dizaji, X. Wang, and H. Huang, Semi-supervised generative adversarial network for gene expression inference, in Proceedings of the International Conference on Knowledge Discovery and Data Mining, London, UK, 2018, pp. 1435-1444.

[11]

T. Lin, A. Roy Chowdhury, and S. Maji, Bilinear CNN models for fine-grained visual recognition, in Proceedings of the International Conference on Computer Vision, Santiago, Chile, 2015, pp. 1449-1457.

Crossref

[12]

H. Zheng, J. Fu, T. Mei, and J. Luo, Learning multi-attention convolutional neural network for fine-grained image recognition, in Proceedings of the International Conference on Computer Vision, Venice, Italy, 2017, pp. 5219-5227.

Crossref

[13]

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, The Caltech-UCSD Birds-200-2011 Dataset, Report, California Institute of Technology, CA, USA, 2011.

[14]

X. Zhang, H. Xiong, W. Zhou, W. Lin, and Q. Tian, Picking deep filter responses for fine-grained image recognition, in Proceedings of the Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 1134-1142.

Crossref

[15]

A. Khosla, N. Jayadevaprakash, B. Yao, and F.-F. Li, Novel dataset for fine-grained image categorization: Stanford dogs, in Proceedings of the Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 2011, p. 1.

[16]

J. Krause, H. Jin, J. Yang, and F. Li, Fine-grained recognition without part annotations, in Proceedings of the Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 5546-5555.

Crossref

[17]

Z.-H. Zhou, Abductive learning: Towards bridging machine learning and logical reasoning, Science China Information Sciences, vol. 62, no. 7, pp. 76 101:1-76 101:3, 2019.

Crossref Google Scholar

[18]

M. Ilse, J. M. Tomczak, and M. Welling, Attention-based deep multiple instance learning, in Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 2018, pp. 2132-2141.

[19]

M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola, Deep sets, in Proc. of Advances in Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 3394-3404.

[20]

Y. Gao, O. Beijbom, N. Zhang, and T. Darrell, Compact bilinear pooling, in Proceedings of the International Conference on Computer Vision, Las Vegas, NV, USA, 2016, pp. 317-326.

Crossref

[21]

N. Zhang, J. Donahue, R. B. Girshick, and T. Darrell, Part-based R-CNNs for fine-grained category detection, in Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 2014, pp. 834-849.

Crossref

[22]

F. Perronnin and D. Larlus, Fisher vectors meet neural networks: A hybrid classification architecture, in Proceedings of the Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 3743-3752.

Crossref

[23]

J. Fu, H. Zheng, and T. Mei, Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition, in Proceedings of the Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 4476-4484.

Crossref

[24]

P. H. O. Pinheiro and R. Collobert, From image-level to pixel-level labeling with convolutional networks, in Proceedings of the Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 1713-1721.

Crossref

[25]

J. Feng and Z. Zhou, Deep MIML network, in Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 2017, pp. 1884-1890.

[26]

Y. Yang, Y. Wu, D. Zhan, Z. Liu, and Y. Jiang, Complex object classification: A multi-modal multi-instance multi-label deep network with optimal transport, in Proceedings of the International Conference on Knowledge Discovery and Data Mining, London, UK, 2018, pp. 2594-2603.

Crossref

[27]

K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio, Show, attend and tell: Neural image caption generation with visual attention, in Proceedings of the International Conference on Machine Learning, Lille, France, 2015, pp. 2048-2057.

[28]

H. Li, M. R. Min, Y. Ge, and A. Kadav, A context-aware attention network for interactive question answering, in Proceedings of the International Conference on Knowledge Discovery and Data Mining, Halifax, Canada, 2017, pp. 927-935.

Crossref

[29]

N. Pappas and A. Popescu-Belis, Explaining the stars: Weighted multiple-instance learning for aspect-based sentiment analysis, in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 2014, pp. 455-466.

Crossref

[30]

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proceedings of the Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770-778.

Crossref

[31]

X. Wang, Y. Yan, P. Tang, X. Bai, and W. Liu, Revisiting multiple instance neural networks, Pattern Recognition, vol. 74, pp. 15-24, 2018.

Crossref Google Scholar

[32]

L. v. d. Maaten and G. Hinton, Visualizing data using t-SNE, Journal of Machine Learning Research, vol. 9, no. 11, pp. 2579-2605, 2008.

Google Scholar

Big Data Mining and Analytics

Volume 3 Issue 1,
March 2020

Pages 29-40

DOI: 10.26599/BDMA.2019.9020017

Cite this article:

Yang Y, Zhu N, Wu Y, et al. A Semi-Supervised Attention Model for Identifying Authentic Sneakers. Big Data Mining and Analytics, 2020, 3(1): 29-40. https://doi.org/10.26599/BDMA.2019.9020017

977

Views

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 21 May 2019

Accepted: 25 September 2019

Published: 19 December 2019

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).