AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (7.3 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

A Fine-Grained Image Classification Model Based on Hybrid Attention and Pyramidal Convolution

School of Computer Science, Qufu Normal University, Rizhao 276826, China
Show Author Information

Abstract

Finding more specific subcategories within a larger category is the goal of fine-grained image classification (FGIC), and the key is to find local discriminative regions of visual features. Most existing methods use traditional convolutional operations to achieve fine-grained image classification. However, traditional convolution cannot extract multi-scale features of an image and existing methods are susceptible to interference from image background information. Therefore, to address the above problems, this paper proposes an FGIC model (Attention-PCNN) based on hybrid attention mechanism and pyramidal convolution. The model feeds the multi-scale features extracted by the pyramidal convolutional neural network into two branches capturing global and local information respectively. In particular, a hybrid attention mechanism is added to the branch capturing global information in order to reduce the interference of image background information and make the model pay more attention to the target region with fine-grained features. In addition, the mutual-channel loss (MC-LOSS) is introduced in the local information branch to capture fine-grained features. We evaluated the model on three publicly available datasets CUB-200-2011, Stanford Cars, FGVC-Aircraft, etc. Compared to the state-of-the-art methods, the results show that Attention-PCNN performs better.

References

[1]

L. Kong, G. Li, W. Rafique, S. Shen, Q. He, M. R. Khosravi, R. Wang, and L. Qi, Time-aware missing healthcare data prediction based on ARIMA model, IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 21, no. 4, pp. 1042–1050, 2024.

[2]
L. Qi, W. Lin, X. Zhang, W. Dou, X. Xu, and J. Chen, A correlation graph based approach for personalized and compatible web APIs recommendation in mobile APP development, IEEE Trans. Knowl. Data Eng., p. 1, 2022.
[3]

H. Zhang, M. Hao, H. Wu, H.-F. Ting, Y. Tang, W. Xi, and Y. Wei, Protein residue contact prediction based on deep learning and massive statistical features from multi-sequence alignment, Tsinghua Science and Technology, vol. 27, no. 5, pp. 843–854, 2022.

[4]
K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770–778.
[5]
M. Tan and Q. Le, Efficientnet: Rethinking model scaling for convolutional neural networks, in Proc. Inter national conference on machine learning, Lugano, Switzerland, pp. 6105–6114, 2019.
[6]
S. Branson, G. Van Horn, S. Belongie, and P. Perona, Bird species categorization using pose normalized deep convolutional nets, arXiv preprint arXiv:1406.2952, 2014.
[7]
S. Huang, Z. Xu, D. Tao, and Y. Zhang, Part-stacked CNN for fine-grained visual categorization, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 1173–1182.
[8]
J. Krause, H. Jin, J. Yang, and F.-F. Li, Fine-grained recognition without part annotations, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp. 5546–5555.
[9]
D. Lin, X. Shen, C. Lu, and J. Jia, Deep LAC: Deep localization, alignment and classification for fine-grained recognition, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp. 1666–1674.
[10]

X.-S. Wei, C.-W. Xie, J. Wu, and C. Shen, Mask-CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization, Pattern Recognit., vol. 76, pp. 704–714, 2018.

[11]
Y. Cui, F. Zhou, J. Wang, X. Liu, Y. Lin, and S. Belongie, Kernel pooling for convolutional neural networks, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 2921–2930, 2017.
[12]
M. Engin, L. Wang, L. Zhou, and X. Liu, DeepKSPD: Learning kernel-matrix-based SPD representation for fine-grained image recognition, arXiv preprint arXiv: 1711.04047, 2018.
[13]
T.-Y. Lin, A. RoyChowdhury, and S. Maji, Bilinear CNN models for fine-grained visual recognition, in Proc. IEEE Int. Conf. Computer Vision (ICCV), Santiago, Chile, 2015, pp. 1449–1457.
[14]

S. Min, H. Yao, H. Xie, Z.-J. Zha, and Y. Zhang, Multi-objective matrix normalization for fine-grained visual recognition, IEEE Trans. Image Process., vol. 29, pp. 4996–5009, 2020.

[15]

T. Shen, C. Gao, and D. Xu, The analysis of intelligent real-time image recognition technology based on mobile edge computing and deep learning, J. Real Time Image Process., vol. 18, no. 4, pp. 1157–1166, 2021.

[16]
J. Dong, Y. Wang, J. Lai, and X. Xie, Improving adversarially robust few-shot image classification with generalizable representations, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 9025–9034.
[17]
J. Dong, S.-M. Moosavi-Dezfooli, J. Lai, and X. Xie, The enemy of my enemy is my friend: Exploring inverse adversaries for improving adversarial training, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 2023, pp. 24678–24687.
[18]
N. Zhang, J. Donahue, R. Girshick, and T. Darrell, Part-based R-CNNs for fine-grained category detection, in Lecture Notes in Computer Science, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, eds., Cham, The Switzerland: Springer, vol. 8689, 2014, pp. 834–849.
[19]

X. Luo, Z. Yu, Z. Zhao, W. Zhao, and J.-H. Wang, Effective short text classification via the fusion of hybrid features for IoT social data, Digit. Commun. Netw., vol. 8, no. 6, pp. 942–954, 2022.

[20]

W. Batayneh, E. Abdulhay, and M. Alothman, Comparing the efficiency of artificial neural networks in sEMG-based simultaneous and continuous estimation of hand kinematics, Digit. Commun. Netw., vol. 8, no. 2, pp. 162–173, 2022.

[21]
T. Xiao, Y. Xu, K. Yang, J. Zhang, Y. Peng, and Z. Zhang, The application of two-level attention models in deep convolutional neural network for fine-grained image classification, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp. 842–850.
[22]
C. Yu, X. Zhao, Q. Zheng, P. Zhang, and X. You, Hierarchical bilinear pooling for fine-grained visual recognition, in Lecture Notes in Computer Science, V. Ferrari, M. Hebert, C. Sminchisescu, Y. Weiss, eds. Cham, The Switzerland: Springer, 2018, pp. 595–610.
[23]

P. Zhuang, Y. Wang, and Y. Qiao, Learning attentive pairwise interaction for fine-grained classification, Proc. AAAI Conf. Artif. Intell., vol. 34, no. 7, pp. 13130–13137, 2020.

[24]
X.-S. Wei, Y. Shen, X. Sun, H.-J. Ye, and J. Yang, Aˆ2-net: Learning attribute-aware hash codes for large-scale fine-grained image retrieval, in Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J.W. Vaughan, eds. Red Hook, NY, USA: Curran Associates, Inc., vol. 34, pp. 5720–5730, 2021.
[25]

F. Wang, G. Li, Y. Wang, W. Rafique, M. R. Khosravi, G. Liu, Y. Liu, and L. Qi, Privacy-aware traffic flow prediction based on multi-party sensor data with zero trust in smart city, ACM Trans. Internet Technol., vol. 23, no. 3, pp. 1–19, 2023.

[26]

Y. Zhang, J. Pan, L. Qi, and Q. He, Privacy-preserving quality prediction for edge-based IoT services, Future Gener. Comput. Syst., vol. 114, pp. 336–348, 2021.

[27]
L. Yang, X. Li, R. Song, B. Zhao, J. Tao, S. Zhou, J. Liang, and J. Yang, Dynamic MLP for fine-grained image classification by leveraging geographical and temporal information, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 10945–10954.
[28]

Q. Hua, L. Chen, P. Li, S. Zhao, and Y. Li, A pixel–channel hybrid attention model for image processing, Tsinghua Science and Technology, vol. 27, no. 5, pp. 804–816, 2022.

[29]

J. Ye, S. Xue, and A. Jiang, Attention-based spatio-temporal graph convolutional network considering external factors for multi-step traffic flow prediction, Digit. Commun. Netw., vol. 8, no. 3, pp. 343–350, 2022.

[30]
J. Fu, H. Zheng, and T. Mei, Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 4438–4446.
[31]

Y. Peng, X. He, and J. Zhao, Object-part attention model for fine-grained image classification, IEEE Trans. Image Process., vol. 27, no. 3, pp. 1487–1500, 2018.

[32]
H. Zheng, J. Fu, Z.-J. Zha, and J. Luo, Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 5012–5021.
[33]
J. Gao, X. Liu, Y. Chen, and F. Xiong, MHGCN: Multiview highway graph convolutional network for cross-lingual entity alignment, Tsinghua Science and Technology, vol. 27, no. 4, pp. 719–728, 2022.
[34]
R. Ji, L. Wen, L. Zhang, D. Du, Y. Wu, C. Zhao, X. Liu, and F. Huang, Attention convolutional binary neural tree for fine-grained visual categorization, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 10468–10477.
[35]
D. Chang, Y. Ding, J. Xie, A. K. Bhunia, X. Li, Z. Ma, M. Wu, J. Guo, and Y.-Z. Song, The devil is in the channels: Mutual-channel loss for fine-grained image classification, IEEE Trans. Image Process., DOI: 10.1109/TIP.2020.2973812, 2020.
[36]
I. C. Duta, L. Liu, F. Zhu, and L. Shao, Pyramidal convolution: Rethinking convolutional neural networks for visual recognition, arXiv preprint arXiv: 2006.11538, 2020.
[37]

P. Jiang, H. Wu, and C. Xin, DeepPOSE: Detecting GPS spoofing attack via deep recurrent neural network, Digit. Commun. Netw., vol. 8, no. 5, pp. 791–803, 2022.

[38]

Q. Zhang, X. Zhang, H. Hu, C. Li, Y. Lin, and R. Ma, Sports match prediction model for training and exercise using attention-based LSTM network, Digit. Commun. Netw., vol. 8, no. 4, pp. 508–515, 2022.

[39]
S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, CBAM: Convolutional block attention module, in Lecture Notes in Computer Science, V. Ferrari, M. Hebert, C. Sminchisescu, Y. Weiss, eds. Cham, The Switzerland: Springer, 2018, pp. 3–19.
[40]
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, The caltech-ucsd birds-200-2011 dataset, https://authors.library.caltech.edu/records/cvm3y-5hh21, 2011.
[41]
J. Krause, M. Stark, D. Jia, and F.-F. Li, 3D object representations for fine-grained categorization, in Proc. IEEE Int. Conf. Computer Vision Workshops, Sydney, Australia, 2013, pp. 554–561.
[42]
S. Maji, E. Rahtu, J. Kannala, M. Blaschko, and A. Vedaldi, Fine-grained visual classification of aircraft, arXiv preprint arXiv:1306.5151, 2013.
[43]
H. Zhang, T. Xu, M. Elhoseiny, X. Huang, S. Zhang, A. Elgammal, and D. Metaxas, Spda-cnn: Unifying semantic part detection and abstraction for finegrained recognition, in Proc. IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp. 1143–1152, 2016.
[44]
H. Zheng, J. Fu, T. Mei, and J. Luo, Learning multi-attention convolutional neural network for fine-grained image recognition, in Proc. IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017, pp. 5209–5217.
[45]

L. Wang, K. He, X. Feng, and X. Ma, Multilayer feature fusion with parallel convolutional block for fine-grained image classification, Appl. Intell., vol. 52, no. 3, pp. 2872–2883, 2022.

[46]
X. Liu, T. Xia, J. Wang, and Y. Lin, Fully convolutional attention localization networks: Efficient attention localization for fine-grained recognition, arXiv preprint arXiv:1603.06765, 2016.
[47]
D. Wang, Z. Shen, J. Shao, W. Zhang, X. Xue, and Z. Zhang, Multiple granularity descriptors for fine-grained categorization, in Proc. IEEE Int. Conf. Computer Vision (ICCV), Santiago, Chile, 2015, pp. 2399–2406.
[48]
Y. Wang, J. Choi, V. I. Morariu, and L. S. Davis, Mining discriminative triplets of patches for fine-grained classification, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 1163–1172.
[49]

H. Liu, L. Qi, S. Shen, A. Ali Khan, S. Meng, and Q. Li, Microservice-driven privacy-aware cross-platform social relationship prediction based on sequential information, Softw. Pract. Exp., vol. 54, no. 1, pp. 85–105, 2024.

[50]
R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, Grad-CAM: Visual explanations from deep networks via gradient-based localization, in Proc. IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017, pp. 618–626.
Tsinghua Science and Technology
Pages 1283-1293
Cite this article:
Wang S, Li S, Li A, et al. A Fine-Grained Image Classification Model Based on Hybrid Attention and Pyramidal Convolution. Tsinghua Science and Technology, 2025, 30(3): 1283-1293. https://doi.org/10.26599/TST.2024.9010025

49

Views

4

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 22 November 2023
Revised: 19 December 2023
Accepted: 22 January 2024
Published: 30 December 2024
© The Author(s) 2025.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return