[2]
J. Tian, C. Shen, B. Wang, X. Xia, M. Zhang, C. Lin, and Q. Li, LESSON: Multi-label adversarial false data injection attack for deep learning locational detection, IEEE Trans. Depend. Secur. Comput., vol. 21, no. 5, pp. 4418–4432, 2024.
[4]
G. Hinton, O. Vinyals, and J. Dean, Distilling the knowledge in a neural network, arXiv preprint arXiv: 1503.02531, 2015.
[5]
B. Zhao, Q. Cui, R. Song, Y. Qiu, and J. Liang, Decoupled knowledge distillation, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022, pp. 11953–11962.
[6]
W. Park, D. Kim, Y. Lu, and M. Cho, Relational knowledge distillation, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 3967–3976.
[7]
Y. Tian, D. Krishnan, and P. Isola, Contrastive representation distillation, arXiv preprint arXiv: 1910.10699, 2019.
[8]
J. Wang, Y. Yang, J. Mao, Z. Huang, C. Huang, and W. Xu, CNN-RNN: A unified framework for multi-label image classification, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 2285–2294.
[9]
Z. Wang, T. Chen, G. Li, R. Xu, and L. Lin, Multi-label image recognition by recurrently discovering attentional regions, in Proc. 2017 IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 464–472.
[11]
Y. Wei, W. Xia, M. Lin, J. Huang, B. Ni, J. Dong, Y. Zhao, and S. Yan, HCP: A flexible CNN framework for multi-label image classification, IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 9, pp. 1901–1907, 2016.
[12]
R. You, Z. Guo, L. Cui, X. Long, Y. Bao, and S. Wen, Cross-modality attention with semantic graph embedding for multi-label classification, in Proc. 34 th AAAI Conf. Artificial Intelligence, New York, NY, USA, 2020, pp. 12709–12716.
[13]
J. Zhao, K. Yan, Y. Zhao, X. Guo, F. Huang, and J. Li, Transformer-based dual relation graph for multi-label image recognition, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision, Montreal, Canada, 2021, pp. 163–172.
[14]
P. Yang, M. K. Xie, C. C. Zong, L. Feng, G. Niu, M. Sugiyama, and S. J. Huang, Multi-label knowledge distillation, in Proc. 2023 IEEE/CVF Int. Conf. Computer Vision, Paris, France, 2023, pp. 17271–17280.
[15]
Y. Liu, L. Sheng, J. Shao, J. Yan, S. Xiang, and C. Pan, Multi-label image classification via knowledge distillation from weakly-supervised detection, in Proc. 26 th ACM Int. Conf. Multimedia, Seoul, Republic of Korea, 2018, pp. 700–708.
[16]
J. Xu, S. Huang, F. Zhou, L. Huangfu, D. Zeng, and B. Liu, Boosting multi-label image classification with complementary parallel self-distillation, in Proc. 31 st Int. Conf. Artificial Inteligence, Virtual Event, 2022, pp. 1495–1501.
[17]
L. Song, J. Wu, M. Yang, Q. Zhang, Y. Li, and J. Yuan, Handling difficult labels for multi-label image classification via uncertainty distillation, in Proc. 29 th ACM Int. Conf. Multimedia, Virtual Event, 2021, pp. 2410–2419.
[18]
A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio, FitNets: Hints for thin deep nets, in Proc. 3 rd Int. Conf. Learning Representations, San Diego, CA, USA, 2015. https://dblp.uni-trier.de/db/conf/iclr/iclr2015.html#RomeroBKCGB14.
[19]
N. Passalis and A. Tefas, Learning deep representations with probabilistic knowledge transfer, in Proc. 15 th European Conf. Computer Vision, Munich, Germany, 2018, pp. 268–284.
[20]
P. Chen, S. Liu, H. Zhao, and J. Jia, Distilling knowledge via knowledge review, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, TN, USA, 2021, pp. 5008–5017.
[21]
T. Ridnik, E. Ben-Baruch, N. Zamir, A. Noy, I. Friedman, M. Protter, and L. Zelnik-Manor, Asymmetric loss for multi-label classification, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision, Montreal, Canada, 2021, pp. 82–91.
[24]
T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, A simple framework for contrastive learning of visual representations, in Proc. 37 th Int. Conf. Machine Learning, Virtual Event, 2020, p. 149.
[25]
A. van den Oord, Y. Li, and O. Vinyals, Representation learning with contrastive predictive coding, arXiv preprint arXiv: 1807.03748, 2018.
[26]
S. Wang, J. Gao, Z. Li, X. Zhang, and W. Hu, A closer look at self-supervised lightweight vision transformers, in Proc. 40 th Int. Conf. Machine Learning, Honolulu, HI, USA, 2023, p. 1482.
[27]
P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan, Supervised contrastive learning, in Proc. 34 th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, p. 1567.
[28]
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., Learning transferable visual models from natural language supervision, in Proc. 38 th Int. Conf. Machine Learning, Virtual Event, 2021, pp. 8748–8763.
[29]
J. Li, P. Zhou, C. Xiong, and S. C. Hoi, Prototypical contrastive learning of unsupervised representations, in Proc. 9 th Int. Conf. Learning Representations, Virtual Event, 2021. https://dblp.uni-trier.de/db/conf/iclr/iclr2021.html#0001ZXH21.
[30]
S. D. Dao, Z. Ethan, P. Dinh, and J. Cai, Contrast learning visual attention for multi label classification, arXiv preprint arXiv: 2107.11626, 2021.
[32]
J. Liu, X. Guo, and Y. Yuan, Unknown-oriented learning for open set domain adaptation, in Proc. 17 th European Conf. Computer Vision, Tel Aviv, Israel, 2022, pp. 334–350.
[33]
I. P. Singh, E. Ghorbel, A. Kacem, A. Rathinam, and D. Aouada, Discriminator-free unsupervised domain adaptation for multi-label image classification, in Proc. 2024 IEEE/CVF Winter Conf. Applications of Computer Vision, Waikoloa, HI, USA, 2024, pp. 3936–3945.
[35]
A. Mustafa, S. Khan, M. Hayat, R. Goecke, J. Shen, and L. Shao, Adversarial defense by restricting the hidden space of deep neural networks, in Proc. 2019 IEEE/CVF Int. Conf. Computer Vision, Seoul, Republic of Korea, 2019, pp. 3385–3394.
[36]
G. Li, V. Jampani, L. Sevilla-Lara, D. Sun, J. Kim, and J. Kim, Adaptive prototype learning and allocation for few-shot segmentation, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, TN, USA, 2021, pp. 8334–8343.
[37]
K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770–778.
[38]
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. C. Chen, MobileNetV2: Inverted residuals and linear bottlenecks, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 4510–4520.
[39]
X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, RepVGG: Making VGG-style ConvNets great again, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, TN, USA, 2021, pp. 13733–13742.
[40]
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision, Montreal, Canada, 2021, pp. 10012–10022.
[41]
T. Ridnik, G. Sharir, A. Ben-Cohen, E. Ben-Baruch, and A. Noy, ML-decoder: Scalable and versatile classification head, in Proc. 2023 IEEE/CVF Winter Conf. Applications of Computer Vision, Waikoloa, HI, USA, 2023, pp. 32–41.
[42]
F. Nielsen, A family of statistical symmetric divergences based on Jensen’s inequality, arXiv preprint arXiv: 1009.4004, 2010.
[44]
W. Bryc, The Normal Distribution. New York, NY, USA: Springer, 1995, p. 17.
[46]
T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, Microsoft COCO: Common objects in context, in Proc. 13 th European Conf. Computer Vision, Zurich, Switzerland, 2014, pp. 740–755.
[47]
T. S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng, NUS-WIDE: A real-world web image database from National University of Singapore, in Proc. ACM Int. Conf. Image and Video Retrieval, Santorini, Greece, 2019, p. 48.
[48]
S. Zagoruyko and N. Komodakis, Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer, in Proc. 5 th Int. Conf. Learning Representations, Toulon, France, 2017. https://dblp.uni-trier.de/db/conf/iclr/iclr2017.html#ZagoruykoK17.
[49]
Z. Guo, H. Yan, H. Li, and X. Lin, Class attention transfer based knowledge distillation, in Proc. 2023 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Vancouver, Canada, 2023, pp. 11868–11877.
[50]
Q. Lan and Q. Tian, Gradient-guided knowledge distillation for object detectors, in Proc. 2024 IEEE/CVF Winter Conf. Applications of Computer Vision, Waikoloa, HI, USA, 2024, pp. 424–433.
[51]
Z. M. Chen, X. S. Wei, X. Jin, and Y. Guo, Multi-label image recognition with joint class-aware map disentangling and label correlation embedding, in Proc. 2019 IEEE Int. Conf. Multimedia and Expo, Shanghai, China, 2019, pp. 622–627.
[52]
Z. M. Chen, Q. Cui, B. Zhao, R. Song, X. Zhang, and O. Yoshie, SST: Spatial and semantic transformers for multi-label image recognition, IEEE Trans. Image Process., vol. 31, pp. 2570–2583, 2022.
[53]
Y. Wu, H. Liu, S. Feng, Y. Jin, G. Lyu, and Z. Wu, GM-MLIC: Graph matching based multi-label image classification, in Proc. 30 th Int. Joint Conf. Artificial Intelligence, Montreal, Canada, 2021, pp. 1179–1185.
[54]
S. D. Dao, E. Zhao, D. Phung, and J. Cai, Multi-label image classification with contrastive learning, arXiv preprint arXiv: 2107.11626, 2021.
[55]
R. Liu, H. Liu, G. Li, H. Hou, T. Yu, and T. Yang, Contextual debiasing for visual recognition with causal mechanisms, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022, pp. 12755–12765.
[56]
S. Liu, L. Zhang, X. Yang, H. Su, and J. Zhu, Query2Label: A simple transformer way to multi-label classification, arXiv preprint arXiv: 2107.10834, 2021.
[57]
E. D. Cubuk, B. Zoph, J. Shlens, and Q. V. Le, Randaugment: Practical automated data augmentation with a reduced search space, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 2020, pp. 702–703.
[58]
L. Yuan, F. E. H. Tay, G. Li, T. Wang, and J. Feng, Revisiting knowledge distillation via label smoothing regularization, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 3903–3911.
[61]
D. Huynh and E. Elhamifar, A shared multi-attention framework for multi-label zero-shot learning, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 8776–8786.
[62]
S. Narayan, A. Gupta, S. Khan, F. S. Khan, L. Shao, and M. Shah, Discriminative region-based multi-label zero-shot learning, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision, Montreal, Canada, 2021, pp. 8731–8740.
[63]
B. Poole, S. Ozair, A. Van Den Oord, A. Alemi, and G. Tucker, On variational bounds of mutual information, in Proc. 36 th Int. Conf. Machine Learning, Long Beach, CA, USA, 2019, pp. 5171–5180.
[64]
J. Song and S. Ermon, Multi-label contrastive predictive coding, in Proc. 34 th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, p. 684.
[68]
R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, Grad-CAM: Visual explanations from deep networks via gradient-based localization, in Proc. 2017 IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 618–626.
[69]
M. Lerma and M. Lucas, Grad-CAM++ is equivalent to Grad-CAM with positive gradients, arXiv preprint arXiv: 2205.10838, 2022.
[70]
M. B. Muhammad and M. Yeasin, Eigen-CAM: Class activation map using principal components, in Proc. 2020 Int. Joint Conf. Neural Networks, Glasgow, UK, 2020, pp. 1–7.
[71]
S. Desai and H. G. Ramaswamy, Ablation-CAM: Visual explanations for deep convolutional network via gradient-free localization, in Proc. 2020 IEEE Winter Conf. Applications of Computer Vision, Snowmass, CO, USA, 2020, pp. 983–991.
[72]
H. Wang, Z. Wang, M. Du, F. Yang, Z. Zhang, S. Ding, P. Mardziel, and X. Hu, Score-CAM: Score-weighted visual explanations for convolutional neural networks, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 2020, pp. 24–25.
[75]
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16x16 words: Transformers for image recognition at scale, in Proc. 9 th Int. Conf. Learning Representations, Virtual Event, 2021.
[76]
C. Chen, O. Li, D. Tao, A. J. Barnett, J. Su, and C. Rudin, This looks like that: Deep learning for interpretable image recognition, in Proc. 33 rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, p. 801.
[77]
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, Caltech-UCSD Birds-200-2011. Pasadena, CA, USA: California Institute of Technology, 2011.