Discover the SciOpen Platform and Achieve Your Research Goals with Ease.
Search articles, authors, keywords, DOl and etc.
The prevalence of long-tailed distributions in real-world data often results in classification models favoring the dominant classes, neglecting the less frequent ones. Current approaches address the issues in long-tailed image classification by rebalancing data, optimizing weights, and augmenting information. However, these methods often struggle to balance the performance between dominant and minority classes because of inadequate representation learning of the latter. To address these problems, we introduce descriptional words into images as cross-modal privileged information and propose a cross-modal enhanced method for long-tailed image classification, referred to as CMLTNet. CMLTNet improves the learning of intra-class similarity of tail-class representations by cross-modal alignment and captures the difference between the head and tail classes in semantic space by cross-modal inference. After fusing the above information, CMLTNet achieved an overall performance that was better than those of benchmark long-tailed and cross-modal learning methods on the long-tailed cross-modal datasets, NUS-WIDE and VireoFood-172. The effectiveness of the proposed modules was further studied through ablation experiments. In a case study of feature distribution, the proposed model was better in learning representations of tail classes, and in the experiments on model attention, CMLTNet has the potential to help learn some rare concepts in the tail class through mapping to the semantic space.
Zhang, Y.; Wei, X. S.; Zhou, B.; Wu, J. Bag of tricks for long-tailed visual recognition with deep convolutional neural networks. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 35, No. 4, 3447–3455, 2021.
Vapnik, V.; Vashist, A. A new learning paradigm: Learning using privileged information. Neural Networks Vol. 22, Nos. 5–6, 544–557, 2009.
Vapnik, V.; Izmailov, R. Learning using privileged information: Similarity control and knowledge transfer. Journal of Machine Learning Research Vol. 16, No. 61, 2023–2049, 2015.
Li, X.; Xu, Z.; Wei, K.; Deng, C. Generalized zero-shot learning via disentangled representation. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 35, No. 3, 1966–1974, 2021.
Gao, J.; Chen, J.; Fu, H.; Jiang, Y. G. Dynamic mixup for multi-label long-tailed food ingredient recognition. IEEE Transactions on Multimedia Vol. 25, 4764–4773, 2023.
Jiang, S.; Min, W.; Liu, L.; Luo, Z. Multi-scale multi-view deep feature aggregation for food recognition. IEEE Transactions on Image Processing Vol. 29, 265–276, 2020.
Tang, J.; Shu, X.; Li, Z.; Qi, G. J.; Wang, J. Generalized deep transfer networks for knowledge propagation in heterogeneous domains. ACM Transactions on Multimedia Computing, Communications, and Applications Vol. 12, No. 4s, Article No. 68, 2016.
Tang, J.; Shu, X.; Qi, G. J.; Li, Z.; Wang, M.; Yan, S.; Jain, R. Tri-clustered tensor completion for social-aware image tag refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 8, 1662–1674, 2017.
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.