A Pixel–Channel Hybrid Attention Model for Image Processing

Qiang Hua; Liyou Chen; Pan Li; Shipeng Zhao; Yan Li

doi:10.26599/TST.2021.9010054

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (6.6 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

A Pixel–Channel Hybrid Attention Model for Image Processing

Qiang Hua, Liyou Chen(

), Pan Li(

), Shipeng Zhao(

), Yan Li(

)

College of Mathematics and Information Science, Hebei University, Baoding 071002, China

Research Center for Applied Mathematics and Interdisciplinary Sciences, Beijing Normal University, Zhuhai 519087, China

Show Author Information

Abstract

In the field of image processing, better results can often be achieved through the deepening of neural network layers involving considerably more parameters. In image classification, improving classification accuracy without introducing too many parameters remains a challenge. As for image conversion, the use of the conversion model of the generative adversarial network often produces semantic artifacts, resulting in images with lower quality. Thus, to address the above problems, a new type of attention module is proposed in this paper for the first time. This proposed approach uses the pixel–channel hybrid attention (PCHA) mechanism, which combines the attention information of the pixel and channel domains. The comparative results of using different attention modules on multiple-image data verify the superiority of the PCHA module in performing classification tasks. For image conversion, we propose a skip structure (S-PCHA model) in the up- and down-sampling processes based on the PCHA model. The proposed model can help the algorithm identify the most distinctive semantic object in a given image, as this structure effectively realizes the intercommunication of encoder and decoder information. Furthermore, the results showed that the attention model could establish a more realistic mapping from the source domain to the target domain in the image conversion algorithm, thus improving the quality of the image generated by the conversion model.

Keywords

deep learning image classification convolutional neural networks image processing attention mechanism

References

[1]

D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate, presented at the 2015 International Conference on Learning Representations, San Diego, CA, USA, 2015.

[2]

K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio, Show, attend and tell: Neural image caption generation with visual attention, in Proc. of the 32nd International Conference on Machine Learning, Lile, France, 2015, pp. 2048–2057.

[3]

J. Lu, J. Yang, D. Batra, and D. Parikh, Hierarchical question-image co-attention for visual question answering, in Proc. of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 2016, 289–297.

[4]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.

Crossref Google Scholar

[5]

J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, Squeeze-and-excitation networks, In Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Silver Spring, MD, USA, 2018, pp. 7132–7141.

Crossref

[6]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł Kaiser, and I. Polosukhin, Attention is all you need, in Proc. of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 6000–6010.

[7]

F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, and X. Tang, Residual attention network for image classification, in Proc. of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 3156–3164.

Crossref

[8]

S. Woo, J. Park, J. Y. Lee, and I. S. Kweon, Cbam: Convolutional block attention module, in Proc. of the 15th European Conference on Computer Vision, Munich, Germany, 2018, pp. 3–19.

Crossref

[9]

X. Wang, R. Girshick, A. Gupta, and K. He, Non-local neural networks, in Proc. of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 7794–7803.

Crossref

[10]

J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, Dual attention network for scene segmentation, in Proc. of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 3146–3154.

Crossref

[11]

M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, Spatial transformer networks, in Proc. of the 28th International Conference on Neural Information Processing Systems, Montreal, Canada, 2015, pp. 2017–2025.

[12]

W. T. Chan, F. Y. L. Chin, D. Ye, G. Zhang, and Y. Zhang, On-line scheduling of parallel jobs on two machines, Journal of Discrete Algorithms, vol. 6, no. 1, pp. 3–10, 2008.

Crossref Google Scholar

[13]

R. Xin, J. Zhang, and Y. Shao, Complex network classification with convolutional neural network, Tsinghua Science and Technology, vol. 25, no. 4, pp. 447–457, 2020.

Crossref Google Scholar

[14]

W. T. Chan, Y. Zhang, S. P. Y. Fung, D. Ye, and H. Zhu, Efficient algorithms for finding a longest common increasing subsequence, Journal of Combinatorial Optimization, vol. 13, no. 3, pp. 277–288, 2007.

Crossref Google Scholar

[15]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial nets, Advances in Neural Information Processing Systems, vol. 27, 2672–2680, 2014.

Google Scholar

[16]

P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, Image-to-imagetranslation with conditional adversarial networks, in Proc. of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 1125–1134.

Crossref

[17]

J. Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and E. Shechtman, Multimodal image-to-image translation by enforcing bi-cycle consistency, in Proc. of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 465–476.

[18]

T. Wang, M. Liu, J. Zhu, A. Tao, J. Kautz, and B. Catanzaro, High-resolution image synthesis and semantic manipulation with conditional GANs, in Proc. of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 8798–8807.

Crossref

[19]

X. Liang, H. Zhang, L. Lin, and E. Xing, Generative semantic manipulation with mask-contrasting GAN, in Proc. of the 15th European Conference on Computer Vision, Munich, Germany, 2018, pp. 574–590.

Crossref

[20]

X. Chen, C. Xu, X. Yang, and D. Tao, Attention-gan for object transfiguration in wild images, in Proc. of the 15th European Conference on Computer Vision, Munich, Germany, 2018, pp. 167–184.

Crossref

[21]

Y. A. Mejjati, C. Richardt, J. Tompkin, D. Cosker, and K. I. Kim, Unsupervised attention-guided image to image translation, in Proc. of the 32nd International Conference on Neural Information Processing Systems, Montréal, Canada, 2018, pp. 3697–3707.

[22]

J. Y. Zhu, T. Park, P. Isola, and A. A. Efros, Unpaired image-to-image translation using cycle-consistent adversarial networks, in Proc. of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017, pp. 2223–2232.

Crossref

[23]

M. Mirza and S. Osindero, Conditional generative adversarial nets, arXiv preprint arXiv:1411.1784, 2014.

Google Scholar

[24]

E. Shelhamer, J. Long, and T. Darrell, Fully convolutional networks for semantic segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 4, pp. 640–651, 2017.

Crossref Google Scholar

[25]

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, GANs trained by a two time-scale update rule converge to a local Nash equilibrium, in Proc. of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 6629–6640.

[26]

M. Bińkowski, D. J. Sutherland, M. Arbel, and A. Gretton, Demystifying MMD GANs, presented at International Conference on Learning Representations, Vancouver, Cananda, 2018.

[27]

A. Krizhevsky, Learning multiple layers of features from tiny images, Technical report, University of Toronto, Toronto, Canada, 2009.

[28]

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556, 2014.

Google Scholar

[29]

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, Going deeper with convolutions, in Proc. of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 1–9.

Crossref

[30]

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770–778.

Crossref

[31]

G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, Densely connected convolutional networks, in Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 4700–4708.

Crossref

[32]

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, Rethinking the inception architecture for computer vision, in Proc. of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 2818–2826.

Crossref

[33]

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in Proc. of the 32nd International Conference on Machine Learning, Lile, France, 2015, pp. 448–456.

[34]

Z. Yi, H. Zhang, P. Tan, and M. Gong, Dualgan: Unsupervised dual learning for image-to-image translation, in Proc. of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017, pp. 2849–2857.

Crossref

[35]

T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim, Learning to discover cross-domain relations with generative adversarial networks, in Proc. of the 34th International Conference on Machine Learning, Sydney, Australia, 2017, pp. 1857–1865.

[36]

Y. LeCun, The MNIST database of handwritten digits, http://yann.lecun.com/exdb/mnist/, 1998.

[37]

B. Hui, Y. Liu, J. Qiu, L. Cao, L. Ji, and Z. He, Study of texture segmentation and classification for grading small hepatocellular carcinoma based on CT images, Tsinghua Science and Technology, vol. 26, no. 2, pp. 199–207, 2020.

Crossref Google Scholar

Tsinghua Science and Technology

Volume 27 Issue 5,
October 2022

Pages 804-816

DOI: 10.26599/TST.2021.9010054

Cite this article:

Hua Q, Chen L, Li P, et al. A Pixel–Channel Hybrid Attention Model for Image Processing. Tsinghua Science and Technology, 2022, 27(5): 804-816. https://doi.org/10.26599/TST.2021.9010054

3822

Views

461

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 08 June 2021

Accepted: 30 July 2021

Published: 17 March 2022

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).