AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (11.9 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Multi-Label Prototype-Aware Structured Contrastive Distillation

School of Information Science and Technology, Yunnan Normal University, Kunming 650500, China
Key Laboratory of Education Informatization for Nationalities, Ministry of Education, Kunming 650500, China
Show Author Information

Abstract

Knowledge distillation has demonstrated considerable success in scenarios involving multi-class single-label learning. However, its direct application to multi-label learning proves challenging due to complex correlations in multi-label structures, causing student models to overlook more finely structured semantic relations present in the teacher model. In this paper, we present a solution called multi-label prototype-aware structured contrastive distillation, comprising two modules: Prototype-aware Contrastive Representation Distillation (PCRD) and prototype-aware cross-image structure distillation. The PCRD module maximizes the mutual information of prototype-aware representation between the student and teacher, ensuring semantic representation structure consistency to improve the compactness of intra-class and dispersion of inter-class representations. In the PCSD module, we introduce sample-to-sample and sample-to-prototype structured contrastive distillation to model prototype-aware cross-image structure consistency, guiding the student model to maintain a coherent label semantic structure with the teacher across multiple instances. To enhance prototype guidance stability, we introduce batch-wise dynamic prototype correction for updating class prototypes. Experimental results on three public benchmark datasets validate the effectiveness of our proposed method, demonstrating its superiority over state-of-the-art methods.

References

[1]

W. Qian, J. Huang, F. Xu, W. Shu, and W. Ding, A survey on multi-label feature selection from perspectives of label fusion, Inf. Fusion, vol. 100, p. 101948, 2023.

[2]
J. Tian, C. Shen, B. Wang, X. Xia, M. Zhang, C. Lin, and Q. Li, LESSON: Multi-label adversarial false data injection attack for deep learning locational detection, IEEE Trans. Depend. Secur. Comput., vol. 21, no. 5, pp. 4418–4432, 2024.
[3]

J. Gou, B. Yu, S. J. Maybank, and D. Tao, Knowledge distillation: A survey, Int. J. Comput. Vis., vol. 129, no. 6, pp. 1789–1819, 2021.

[4]
G. Hinton, O. Vinyals, and J. Dean, Distilling the knowledge in a neural network, arXiv preprint arXiv: 1503.02531, 2015.
[5]
B. Zhao, Q. Cui, R. Song, Y. Qiu, and J. Liang, Decoupled knowledge distillation, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022, pp. 11953–11962.
[6]
W. Park, D. Kim, Y. Lu, and M. Cho, Relational knowledge distillation, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 3967–3976.
[7]
Y. Tian, D. Krishnan, and P. Isola, Contrastive representation distillation, arXiv preprint arXiv: 1910.10699, 2019.
[8]
J. Wang, Y. Yang, J. Mao, Z. Huang, C. Huang, and W. Xu, CNN-RNN: A unified framework for multi-label image classification, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 2285–2294.
[9]
Z. Wang, T. Chen, G. Li, R. Xu, and L. Lin, Multi-label image recognition by recurrently discovering attentional regions, in Proc. 2017 IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 464–472.
[10]

Z. M. Chen, X. S. Wei, P. Wang, and Y. Guo, Learning graph convolutional networks for multi-label recognition and applications, IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 6, pp. 6969–6983, 2023.

[11]
Y. Wei, W. Xia, M. Lin, J. Huang, B. Ni, J. Dong, Y. Zhao, and S. Yan, HCP: A flexible CNN framework for multi-label image classification, IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 9, pp. 1901–1907, 2016.
[12]
R. You, Z. Guo, L. Cui, X. Long, Y. Bao, and S. Wen, Cross-modality attention with semantic graph embedding for multi-label classification, in Proc. 34 th AAAI Conf. Artificial Intelligence, New York, NY, USA, 2020, pp. 12709–12716.
[13]
J. Zhao, K. Yan, Y. Zhao, X. Guo, F. Huang, and J. Li, Transformer-based dual relation graph for multi-label image recognition, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision, Montreal, Canada, 2021, pp. 163–172.
[14]
P. Yang, M. K. Xie, C. C. Zong, L. Feng, G. Niu, M. Sugiyama, and S. J. Huang, Multi-label knowledge distillation, in Proc. 2023 IEEE/CVF Int. Conf. Computer Vision, Paris, France, 2023, pp. 17271–17280.
[15]
Y. Liu, L. Sheng, J. Shao, J. Yan, S. Xiang, and C. Pan, Multi-label image classification via knowledge distillation from weakly-supervised detection, in Proc. 26 th ACM Int. Conf. Multimedia, Seoul, Republic of Korea, 2018, pp. 700–708.
[16]
J. Xu, S. Huang, F. Zhou, L. Huangfu, D. Zeng, and B. Liu, Boosting multi-label image classification with complementary parallel self-distillation, in Proc. 31 st Int. Conf. Artificial Inteligence, Virtual Event, 2022, pp. 1495–1501.
[17]
L. Song, J. Wu, M. Yang, Q. Zhang, Y. Li, and J. Yuan, Handling difficult labels for multi-label image classification via uncertainty distillation, in Proc. 29 th ACM Int. Conf. Multimedia, Virtual Event, 2021, pp. 2410–2419.
[18]
A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio, FitNets: Hints for thin deep nets, in Proc. 3 rd Int. Conf. Learning Representations, San Diego, CA, USA, 2015. https://dblp.uni-trier.de/db/conf/iclr/iclr2015.html#RomeroBKCGB14.
[19]
N. Passalis and A. Tefas, Learning deep representations with probabilistic knowledge transfer, in Proc. 15 th European Conf. Computer Vision, Munich, Germany, 2018, pp. 268–284.
[20]
P. Chen, S. Liu, H. Zhao, and J. Jia, Distilling knowledge via knowledge review, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, TN, USA, 2021, pp. 5008–5017.
[21]
T. Ridnik, E. Ben-Baruch, N. Zamir, A. Noy, I. Friedman, M. Protter, and L. Zelnik-Manor, Asymmetric loss for multi-label classification, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision, Montreal, Canada, 2021, pp. 82–91.
[22]

F. Zhou, S. Huang, B. Liu, and D. Yang, Multi-label image classification via category prototype compositional learning, IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 7, pp. 4513–4525, 2022.

[23]

X. Wang and G. J. Qi, Contrastive learning with stronger augmentations, IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 5, pp. 5549–5560, 2023.

[24]
T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, A simple framework for contrastive learning of visual representations, in Proc. 37 th Int. Conf. Machine Learning, Virtual Event, 2020, p. 149.
[25]
A. van den Oord, Y. Li, and O. Vinyals, Representation learning with contrastive predictive coding, arXiv preprint arXiv: 1807.03748, 2018.
[26]
S. Wang, J. Gao, Z. Li, X. Zhang, and W. Hu, A closer look at self-supervised lightweight vision transformers, in Proc. 40 th Int. Conf. Machine Learning, Honolulu, HI, USA, 2023, p. 1482.
[27]
P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan, Supervised contrastive learning, in Proc. 34 th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, p. 1567.
[28]
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., Learning transferable visual models from natural language supervision, in Proc. 38 th Int. Conf. Machine Learning, Virtual Event, 2021, pp. 8748–8763.
[29]
J. Li, P. Zhou, C. Xiong, and S. C. Hoi, Prototypical contrastive learning of unsupervised representations, in Proc. 9 th Int. Conf. Learning Representations, Virtual Event, 2021. https://dblp.uni-trier.de/db/conf/iclr/iclr2021.html#0001ZXH21.
[30]
S. D. Dao, Z. Ethan, P. Dinh, and J. Cai, Contrast learning visual attention for multi label classification, arXiv preprint arXiv: 2107.11626, 2021.
[31]

S. Huang, X. Zeng, S. Wu, Z. Yu, M. Azzam, and H. S. Wong, Behavior regularized prototypical networks for semi-supervised few-shot image classification, Pattern Recognit., vol. 112, p. 107765, 2021.

[32]
J. Liu, X. Guo, and Y. Yuan, Unknown-oriented learning for open set domain adaptation, in Proc. 17 th European Conf. Computer Vision, Tel Aviv, Israel, 2022, pp. 334–350.
[33]
I. P. Singh, E. Ghorbel, A. Kacem, A. Rathinam, and D. Aouada, Discriminator-free unsupervised domain adaptation for multi-label image classification, in Proc. 2024 IEEE/CVF Winter Conf. Applications of Computer Vision, Waikoloa, HI, USA, 2024, pp. 3936–3945.
[34]

Y. Tai, Y. Tan, S. Xiong, and J. Tian, Mine-distill-prototypes for complete few-shot class-incremental learning in image classification, IEEE Trans. Geosci. Remote Sens., vol. 61, p. 5206013, 2023.

[35]
A. Mustafa, S. Khan, M. Hayat, R. Goecke, J. Shen, and L. Shao, Adversarial defense by restricting the hidden space of deep neural networks, in Proc. 2019 IEEE/CVF Int. Conf. Computer Vision, Seoul, Republic of Korea, 2019, pp. 3385–3394.
[36]
G. Li, V. Jampani, L. Sevilla-Lara, D. Sun, J. Kim, and J. Kim, Adaptive prototype learning and allocation for few-shot segmentation, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, TN, USA, 2021, pp. 8334–8343.
[37]
K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770–778.
[38]
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. C. Chen, MobileNetV2: Inverted residuals and linear bottlenecks, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 4510–4520.
[39]
X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, RepVGG: Making VGG-style ConvNets great again, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, TN, USA, 2021, pp. 13733–13742.
[40]
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision, Montreal, Canada, 2021, pp. 10012–10022.
[41]
T. Ridnik, G. Sharir, A. Ben-Cohen, E. Ben-Baruch, and A. Noy, ML-decoder: Scalable and versatile classification head, in Proc. 2023 IEEE/CVF Winter Conf. Applications of Computer Vision, Waikoloa, HI, USA, 2023, pp. 32–41.
[42]
F. Nielsen, A family of statistical symmetric divergences based on Jensen’s inequality, arXiv preprint arXiv: 1009.4004, 2010.
[43]

S. R. Fisher, Dispersion on a sphere, Proc. Roy. Soc. A: Math. Phys. Sci., vol. 217, no. 1130, pp. 295–305, 1953.

[44]
W. Bryc, The Normal Distribution. New York, NY, USA: Springer, 1995, p. 17.
[45]

M. Everingham, S. M. Ali Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, The pascal visual object classes challenge: A retrospective, Int. J. Comput. Vis., vol. 111, no. 1, pp. 98–136, 2015.

[46]
T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, Microsoft COCO: Common objects in context, in Proc. 13 th European Conf. Computer Vision, Zurich, Switzerland, 2014, pp. 740–755.
[47]
T. S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng, NUS-WIDE: A real-world web image database from National University of Singapore, in Proc. ACM Int. Conf. Image and Video Retrieval, Santorini, Greece, 2019, p. 48.
[48]
S. Zagoruyko and N. Komodakis, Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer, in Proc. 5 th Int. Conf. Learning Representations, Toulon, France, 2017. https://dblp.uni-trier.de/db/conf/iclr/iclr2017.html#ZagoruykoK17.
[49]
Z. Guo, H. Yan, H. Li, and X. Lin, Class attention transfer based knowledge distillation, in Proc. 2023 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Vancouver, Canada, 2023, pp. 11868–11877.
[50]
Q. Lan and Q. Tian, Gradient-guided knowledge distillation for object detectors, in Proc. 2024 IEEE/CVF Winter Conf. Applications of Computer Vision, Waikoloa, HI, USA, 2024, pp. 424–433.
[51]
Z. M. Chen, X. S. Wei, X. Jin, and Y. Guo, Multi-label image recognition with joint class-aware map disentangling and label correlation embedding, in Proc. 2019 IEEE Int. Conf. Multimedia and Expo, Shanghai, China, 2019, pp. 622–627.
[52]
Z. M. Chen, Q. Cui, B. Zhao, R. Song, X. Zhang, and O. Yoshie, SST: Spatial and semantic transformers for multi-label image recognition, IEEE Trans. Image Process., vol. 31, pp. 2570–2583, 2022.
[53]
Y. Wu, H. Liu, S. Feng, Y. Jin, G. Lyu, and Z. Wu, GM-MLIC: Graph matching based multi-label image classification, in Proc. 30 th Int. Joint Conf. Artificial Intelligence, Montreal, Canada, 2021, pp. 1179–1185.
[54]
S. D. Dao, E. Zhao, D. Phung, and J. Cai, Multi-label image classification with contrastive learning, arXiv preprint arXiv: 2107.11626, 2021.
[55]
R. Liu, H. Liu, G. Li, H. Hou, T. Yu, and T. Yang, Contextual debiasing for visual recognition with causal mechanisms, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022, pp. 12755–12765.
[56]
S. Liu, L. Zhang, X. Yang, H. Su, and J. Zhu, Query2Label: A simple transformer way to multi-label classification, arXiv preprint arXiv: 2107.10834, 2021.
[57]
E. D. Cubuk, B. Zoph, J. Shlens, and Q. V. Le, Randaugment: Practical automated data augmentation with a reduced search space, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 2020, pp. 702–703.
[58]
L. Yuan, F. E. H. Tay, G. Li, T. Wang, and J. Feng, Revisiting knowledge distillation via label smoothing regularization, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 3903–3911.
[59]

A. Gupta, S. Narayan, S. Khan, F. S. Khan, L. Shao, and J. van de Weijer, Generative multi-label zero-shot learning, IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 12, pp. 14611–14624, 2023.

[60]

Z. Akata, F. Perronnin, Z. Harchaoui, and C. Schmid, Label-embedding for image classification, IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 7, pp. 1425–1438, 2016.

[61]
D. Huynh and E. Elhamifar, A shared multi-attention framework for multi-label zero-shot learning, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 8776–8786.
[62]
S. Narayan, A. Gupta, S. Khan, F. S. Khan, L. Shao, and M. Shah, Discriminative region-based multi-label zero-shot learning, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision, Montreal, Canada, 2021, pp. 8731–8740.
[63]
B. Poole, S. Ozair, A. Van Den Oord, A. Alemi, and G. Tucker, On variational bounds of mutual information, in Proc. 36 th Int. Conf. Machine Learning, Long Beach, CA, USA, 2019, pp. 5171–5180.
[64]
J. Song and S. Ermon, Multi-label contrastive predictive coding, in Proc. 34 th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, p. 684.
[65]

X. Nguyen, M. J. Wainwright, and M. I. Jordan, Estimating divergence functionals and the likelihood ratio by convex risk minimization, IEEE Trans. Inf. Theory, vol. 56, no. 11, pp. 5847–5861, 2010.

[66]

M. D. Donsker and S. R. S. Varadhan, Asymptotic evaluation of certain Markov process expectations for large time. IV, Commun. Pure Appl. Math., vol. 36, no. 2, pp. 183–212, 1983.

[67]

J. Demšar, Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res., vol. 7, pp. 1–30, 2006.

[68]
R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, Grad-CAM: Visual explanations from deep networks via gradient-based localization, in Proc. 2017 IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 618–626.
[69]
M. Lerma and M. Lucas, Grad-CAM++ is equivalent to Grad-CAM with positive gradients, arXiv preprint arXiv: 2205.10838, 2022.
[70]
M. B. Muhammad and M. Yeasin, Eigen-CAM: Class activation map using principal components, in Proc. 2020 Int. Joint Conf. Neural Networks, Glasgow, UK, 2020, pp. 1–7.
[71]
S. Desai and H. G. Ramaswamy, Ablation-CAM: Visual explanations for deep convolutional network via gradient-free localization, in Proc. 2020 IEEE Winter Conf. Applications of Computer Vision, Snowmass, CO, USA, 2020, pp. 983–991.
[72]
H. Wang, Z. Wang, M. Du, F. Yang, Z. Zhang, S. Ding, P. Mardziel, and X. Hu, Score-CAM: Score-weighted visual explanations for convolutional neural networks, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 2020, pp. 24–25.
[73]

P. T. Jiang, C. B. Zhang, Q. Hou, M. M. Cheng, and Y. Wei, LayerCAM: Exploring hierarchical class activation maps for localization, IEEE Trans. Image Process., vol. 30, pp. 5875–5888, 2021.

[74]

L. van der Maaten and G. Hinton, Visualizing data using t-SNE, J. Mach. Learn. Res., vol. 9, no. 86, pp. 2579–2605, 2008.

[75]
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16x16 words: Transformers for image recognition at scale, in Proc. 9 th Int. Conf. Learning Representations, Virtual Event, 2021.
[76]
C. Chen, O. Li, D. Tao, A. J. Barnett, J. Su, and C. Rudin, This looks like that: Deep learning for interpretable image recognition, in Proc. 33 rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, p. 801.
[77]
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, Caltech-UCSD Birds-200-2011. Pasadena, CA, USA: California Institute of Technology, 2011.
Tsinghua Science and Technology
Pages 1808-1830
Cite this article:
Xia Y, Tong Y, Yang J, et al. Multi-Label Prototype-Aware Structured Contrastive Distillation. Tsinghua Science and Technology, 2025, 30(4): 1808-1830. https://doi.org/10.26599/TST.2024.9010182

73

Views

4

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 18 February 2024
Revised: 12 June 2024
Accepted: 24 September 2024
Published: 03 March 2025
© The Author(s) 2025.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return