Journal Home > Volume 27 , Issue 5

In the field of image processing, better results can often be achieved through the deepening of neural network layers involving considerably more parameters. In image classification, improving classification accuracy without introducing too many parameters remains a challenge. As for image conversion, the use of the conversion model of the generative adversarial network often produces semantic artifacts, resulting in images with lower quality. Thus, to address the above problems, a new type of attention module is proposed in this paper for the first time. This proposed approach uses the pixel–channel hybrid attention (PCHA) mechanism, which combines the attention information of the pixel and channel domains. The comparative results of using different attention modules on multiple-image data verify the superiority of the PCHA module in performing classification tasks. For image conversion, we propose a skip structure (S-PCHA model) in the up- and down-sampling processes based on the PCHA model. The proposed model can help the algorithm identify the most distinctive semantic object in a given image, as this structure effectively realizes the intercommunication of encoder and decoder information. Furthermore, the results showed that the attention model could establish a more realistic mapping from the source domain to the target domain in the image conversion algorithm, thus improving the quality of the image generated by the conversion model.


menu
Abstract
Full text
Outline
About this article

A Pixel–Channel Hybrid Attention Model for Image Processing

Show Author's information Qiang HuaLiyou Chen( )Pan Li( )Shipeng Zhao( )Yan Li( )
College of Mathematics and Information Science, Hebei University, Baoding 071002, China
Research Center for Applied Mathematics and Interdisciplinary Sciences, Beijing Normal University, Zhuhai 519087, China

Abstract

In the field of image processing, better results can often be achieved through the deepening of neural network layers involving considerably more parameters. In image classification, improving classification accuracy without introducing too many parameters remains a challenge. As for image conversion, the use of the conversion model of the generative adversarial network often produces semantic artifacts, resulting in images with lower quality. Thus, to address the above problems, a new type of attention module is proposed in this paper for the first time. This proposed approach uses the pixel–channel hybrid attention (PCHA) mechanism, which combines the attention information of the pixel and channel domains. The comparative results of using different attention modules on multiple-image data verify the superiority of the PCHA module in performing classification tasks. For image conversion, we propose a skip structure (S-PCHA model) in the up- and down-sampling processes based on the PCHA model. The proposed model can help the algorithm identify the most distinctive semantic object in a given image, as this structure effectively realizes the intercommunication of encoder and decoder information. Furthermore, the results showed that the attention model could establish a more realistic mapping from the source domain to the target domain in the image conversion algorithm, thus improving the quality of the image generated by the conversion model.

Keywords: deep learning, image classification, convolutional neural networks, image processing, attention mechanism

References(37)

[1]
D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate, presented at the 2015 International Conference on Learning Representations, San Diego, CA, USA, 2015.
[2]
K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio, Show, attend and tell: Neural image caption generation with visual attention, in Proc. of the 32nd International Conference on Machine Learning, Lile, France, 2015, pp. 2048–2057.
[3]
J. Lu, J. Yang, D. Batra, and D. Parikh, Hierarchical question-image co-attention for visual question answering, in Proc. of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 2016, 289–297.
[4]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[5]
J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, Squeeze-and-excitation networks, In Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Silver Spring, MD, USA, 2018, pp. 7132–7141.
DOI
[6]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł Kaiser, and I. Polosukhin, Attention is all you need, in Proc. of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 6000–6010.
[7]
F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, and X. Tang, Residual attention network for image classification, in Proc. of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 3156–3164.
DOI
[8]
S. Woo, J. Park, J. Y. Lee, and I. S. Kweon, Cbam: Convolutional block attention module, in Proc. of the 15th European Conference on Computer Vision, Munich, Germany, 2018, pp. 3–19.
DOI
[9]
X. Wang, R. Girshick, A. Gupta, and K. He, Non-local neural networks, in Proc. of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 7794–7803.
DOI
[10]
J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, Dual attention network for scene segmentation, in Proc. of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 3146–3154.
DOI
[11]
M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, Spatial transformer networks, in Proc. of the 28th International Conference on Neural Information Processing Systems, Montreal, Canada, 2015, pp. 2017–2025.
[12]
W. T. Chan, F. Y. L. Chin, D. Ye, G. Zhang, and Y. Zhang, On-line scheduling of parallel jobs on two machines, Journal of Discrete Algorithms, vol. 6, no. 1, pp. 3–10, 2008.
[13]
R. Xin, J. Zhang, and Y. Shao, Complex network classification with convolutional neural network, Tsinghua Science and Technology, vol. 25, no. 4, pp. 447–457, 2020.
[14]
W. T. Chan, Y. Zhang, S. P. Y. Fung, D. Ye, and H. Zhu, Efficient algorithms for finding a longest common increasing subsequence, Journal of Combinatorial Optimization, vol. 13, no. 3, pp. 277–288, 2007.
[15]
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial nets, Advances in Neural Information Processing Systems, vol. 27, 2672–2680, 2014.
[16]
P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, Image-to-imagetranslation with conditional adversarial networks, in Proc. of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 1125–1134.
DOI
[17]
J. Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and E. Shechtman, Multimodal image-to-image translation by enforcing bi-cycle consistency, in Proc. of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 465–476.
[18]
T. Wang, M. Liu, J. Zhu, A. Tao, J. Kautz, and B. Catanzaro, High-resolution image synthesis and semantic manipulation with conditional GANs, in Proc. of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 8798–8807.
DOI
[19]
X. Liang, H. Zhang, L. Lin, and E. Xing, Generative semantic manipulation with mask-contrasting GAN, in Proc. of the 15th European Conference on Computer Vision, Munich, Germany, 2018, pp. 574–590.
DOI
[20]
X. Chen, C. Xu, X. Yang, and D. Tao, Attention-gan for object transfiguration in wild images, in Proc. of the 15th European Conference on Computer Vision, Munich, Germany, 2018, pp. 167–184.
DOI
[21]
Y. A. Mejjati, C. Richardt, J. Tompkin, D. Cosker, and K. I. Kim, Unsupervised attention-guided image to image translation, in Proc. of the 32nd International Conference on Neural Information Processing Systems, Montréal, Canada, 2018, pp. 3697–3707.
[22]
J. Y. Zhu, T. Park, P. Isola, and A. A. Efros, Unpaired image-to-image translation using cycle-consistent adversarial networks, in Proc. of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017, pp. 2223–2232.
DOI
[23]
M. Mirza and S. Osindero, Conditional generative adversarial nets, arXiv preprint arXiv:1411.1784, 2014.
[24]
E. Shelhamer, J. Long, and T. Darrell, Fully convolutional networks for semantic segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 4, pp. 640–651, 2017.
[25]
M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, GANs trained by a two time-scale update rule converge to a local Nash equilibrium, in Proc. of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 6629–6640.
[26]
M. Bińkowski, D. J. Sutherland, M. Arbel, and A. Gretton, Demystifying MMD GANs, presented at International Conference on Learning Representations, Vancouver, Cananda, 2018.
[27]
A. Krizhevsky, Learning multiple layers of features from tiny images, Technical report, University of Toronto, Toronto, Canada, 2009.
[28]
K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556, 2014.
[29]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, Going deeper with convolutions, in Proc. of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 1–9.
DOI
[30]
K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770–778.
DOI
[31]
G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, Densely connected convolutional networks, in Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 4700–4708.
DOI
[32]
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, Rethinking the inception architecture for computer vision, in Proc. of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 2818–2826.
DOI
[33]
S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in Proc. of the 32nd International Conference on Machine Learning, Lile, France, 2015, pp. 448–456.
[34]
Z. Yi, H. Zhang, P. Tan, and M. Gong, Dualgan: Unsupervised dual learning for image-to-image translation, in Proc. of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017, pp. 2849–2857.
DOI
[35]
T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim, Learning to discover cross-domain relations with generative adversarial networks, in Proc. of the 34th International Conference on Machine Learning, Sydney, Australia, 2017, pp. 1857–1865.
[36]
Y. LeCun, The MNIST database of handwritten digits, http://yann.lecun.com/exdb/mnist/, 1998.
[37]
B. Hui, Y. Liu, J. Qiu, L. Cao, L. Ji, and Z. He, Study of texture segmentation and classification for grading small hepatocellular carcinoma based on CT images, Tsinghua Science and Technology, vol. 26, no. 2, pp. 199–207, 2020.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 08 June 2021
Accepted: 30 July 2021
Published: 17 March 2022
Issue date: October 2022

Copyright

© The author(s) 2022.

Acknowledgements

This research was supported by the National Natural Science Foundation of China (No. 61976141), the Natural Science Foundation of Hebei Province (Nos. F2018201096 and F2018201115), the Natural Science Foundation of Guangdong Province (No. 2018A0303130026), and the Key Foundation of the Education Department of Hebei Province (No. ZD2019021).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return