AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (9 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Adversarial Attack on Object Detection via Object Feature-Wise Attention and Perturbation Extraction

School of Computer Science and Technology, Anhui University of Technology, Maanshan 243032, China
National Key Laboratory of Science and Technology on Automatic Target Recognition, National University of Defense Technology, Changsha 410073, China
Show Author Information

Abstract

Deep neural networks are commonly used in computer vision tasks, but they are vulnerable to adversarial samples, resulting in poor recognition accuracy. Although traditional algorithms that craft adversarial samples have been effective in attacking classification models, the attacking performance degrades when facing object detection models with more complex structures. To address this issue better, in this paper we first analyze the mechanism of multi-scale feature extraction of object detection models, and then by constructing the object feature-wise attention module and the perturbation extraction module, a novel adversarial sample generation algorithm for attacking detection models is proposed. Specifically, in the first module, based on the multi-scale feature map, we reduce the range of perturbation and improve the stealthiness of adversarial samples by computing the noise distribution in the object region. Then in the second module, we feed the noise distribution into the generative adversarial networks to generate adversarial perturbation with strong attack transferability. By doing so, the proposed approach possesses the ability to better confuse the judgment of detection models. Experiments carried out on the DroneVehicle dataset show that our method is computationally efficient and works well in attacking detection models measured by qualitative analysis and quantitative analysis.

References

[1]
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S.Gelly, et al., An image is worth 16×16 words: Transformers for image recognition at scale, in Proc. Int. Conf. on Learning Representations, Virtual, 2021.
[2]
C. F. R. Chen, Q. Fan, and R. Panda, CrossViT: Cross-attention multi-scale vision transformer for image classification, in Proc. IEEE Int. Conf. on Computer Vision, Montreal, Canada, 2021, pp. 347–356.
[3]

W. Liu, K. Quijano, and M. M. Crawford, YOLOv5-Tassel: Detecting tassels in RGB UAV imagery with improved YOLOv5 based on transfer learning, IEEE J. Sel. ToAppl. Earth Obs. Remote Sens. , vol. 15, pp. 8085–8094, 2022.

[4]
C. J. Holder and M. Shafique, On efficient real-time semantic segmentation: A survey, arXiv preprint arXiv: 2206.08605, 2022.
[5]

A. Rudenko, L. Palmieri, M. Herman, K. M. Kitani, D. M. Gavrila, and K. O. Arras, Human motion trajectory prediction: A survey, Int. J. Robot. Res. , vol. 39, no. 8, pp. 895–935, 2020.

[6]
X. Wang, L. Bo, and F. Li, Adaptive wing loss for robust face alignment via heatmap regression, in Proc. IEEE International Conference on Computer Vision, Seoul, Republic of South Korea, 2019, pp. 6970-6980.
[7]

Y. Ma, T. Xie, J. Li, and R. Maciejewski, Explaining vulnerabilities to adversarial machine learning through visual analytics, IEEE Trans. Vis. Comput. Graphics , vol. 26, no. 1, pp. 1075–1085, 2019.

[8]

N. Liu, M. Du, R. Guo, H. Liu, and X. Hu, Adversarial attacks and defenses: An interpretation perspective, ACM SIGKDD Explor. Newsl. , vol. 23, no. 1, pp. 86–99, 2021.

[9]
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, Intriguing properties of neural networks, in Proc. Int. Conf. on Learning Representations, Banff, Canada, 2014.
[10]

H. Kwon, Y. Kim, H.Yoon, and D. Choi, Selective audio adversarial example in evasion attack on speech recognition system, IEEE Trans. Inf. Forensics Sec. , vol. 15, pp. 526–538, 2019.

[11]
D. Jin, Z. Jin, J.T. Zhou, and P. Szolovits, Is bert really robust? a strong baseline for natural language attack on text classification and entailment, in Proc. AAAI Conf. on Artificial Intelligence, New York, NY, USA, 2020, pp. 8018–8025.
[12]
J. Yu, M. Gao, H. Yin, J. Li, C. Gao, and Q. Wang, Generating reliable friends via adversarial training to improve social recommendation, in Proc. IEEE Int. Conf. on Data Mining, Beijing, China, 2019, pp. 768–777.
[13]
G. Beigi, A. Mosallanezhad, R. Guo, H. Alvari, A. Nou, and H. Liu, Privacy-aware recommendation with private-attribute protection using adversarial learning, in Proc. ACM Int. Conf. on Web Search and Data Mining, Houston, TX, USA, 2020, pp. 34–42.
[14]
I. Goodfellow, J. Shlens, and C. Szegedy, Explaining and harnessing adversarial examples, in Proc. Int. Conf. on Learning Representations, San Diego, CA, USA, 2015, pp. 7–9.
[15]
A. Kurakin, I. Goodfellow, and S. Bengio, Adversarial machine learning at scale, arXiv preprint arXiv: 1611.01236, 2016.
[16]
Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li, Boosting adversarial attacks with momentum, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 9185–9193.
[17]
N. Carlini and D. Wagner, Towards evaluating the robustness of neural networks, in Proc. IEEE Symposium on Security and Privacy, San Jose, CA, USA, 2017, pp. 39–57.
[18]
H. Li, X. Xu, X. Zhang, S. Yang, and B. Li, Qeba: Query-efficient boundary-based blackbox attack, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 1221–1230.
[19]
T. Maho, T. Furon, and E. Le Merrer, Surfree: A fast surrogate-free black-box attack, in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 10430–10439.
[20]
J. Fu, J. Sun, and G. Wang, Boosting black-box adversarial attacks with meta learning, in Proc. Chinese Control Conference, Hefei, China, 2022, pp. 7308–7313.
[21]

J. Su, D.V. Vargas, and K. Sakurai, One pixel attack for fooling deep neural networks, IEEE. Trans. Comput. , vol. 23, no. 5, pp. 828–841, 2019.

[22]

Z. Dai, S. Liu, Q. Li, and K. Tang, Saliency attack: Towards imperceptible black-box adversarial attack, ACM. Intell. Syst. Technol. , vol. 14, no. 3, pp. 1–20, 2023.

[23]
J. Lu, H. Sibai, and E. Fabry, Adversarial examples that fool detectors, arXiv preprint arXiv: 1712.02494, 2017.
[24]

Y. Zhang, Z. Gong, Y. Zhang, K. Bin, Y. Li, J. Qi, H. Wen, and P. Zhong, Boosting transferability of physical attack against detectors by redistributing separable attention, Pattern Recogn. , vol. 138, p. 109435, 2023.

[25]
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A.C. Berg, SSD: Single shot multibox detector, in Proc. European Conf. Computer Vision, Amsterdam, the Netherlands, 2016, pp. 21–37.
[26]
J. Redmon and A. Farhadi, YOlOv3: An incremental improvement, arXiv preprint arXiv: 1804.02767, 2018.
[27]
A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, YOlOv4: Optimal speed and accuracy of object detection, arXiv preprint arXiv: 2004.10934, 2020.
[28]
R. Girshick, Fast R-CNN, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 1440–1448.
[29]
S. Ren, K. He, R. Girshick, and J. Sun, FasterR-CNN: Towards real-time object detection with region proposal networks, in Proc. Advances in Neural Information Processing Systems, Montreal, Canada, 2015, pp. 91–99.
[30]
A. Saha, A. Subramanya, K. Patil, and H. Pirsiavash, Role of spatial context in adversarial robustness for object detection, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 2020, pp. 784–785.
[31]
J. Bao, Sparse adversarial attack to object detection, arXiv preprint arXiv: 2012.13692, 2020.
[32]
S. Wu, T. Dai, and S.-T. Xia, Dpattack: Diffused patch attacks against universal object detection, arXiv preprint arXiv: 2010.11679, 2020.
[33]
A. Shapira, A. Zolfi, L. Demetrio, B. Biggio, and A. Shabtai, Phantom sponges: Exploiting non-maximum suppression to attack deep object detectors, in Proc. IEEE Winter Conf. on Applications of Computer Vision, Waikola, HI, USA, 2023, pp. 4571–4580.
[34]
C. Xie, J. Wang, Z. Zhang, Y. Zhou, L. Xie, and A. Yuille, Adversarial examples for semantic segmentation and object detection, in Proc. IEEE Int. Conf. on Computer Vision, Venice, Italy, 2017, pp. 1369–1378.
[35]
X. Wei, S. Liang, N. Chen, and X. Cao, Transferable adversarial attacks for image and video object detection, arXiv preprint arXiv: 1811.12641, 2018.
[36]
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial nets, in Proc. Advances in Neural Information Processing Systems, Montreal, Canada, 2014.
[37]
A. Radford, L. Metz, and S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks, arXiv preprint arXiv: 1511.06434, 2015.
[38]
X. Wang, X. He, J. Wang, and K. He, Admix: Enhancing the transferability of adversarial attacks, in Proc. IEEE Int. Conf. on Computer Vision, Montreal, Canada, 2021, pp. 16158–16167.
[39]
T.Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, Feature pyramid networks for object detection, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 2117–2125.
[40]
K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv: 1409.1556, 2014.
[41]
K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770–778.
[42]
C. Y. Wang, H. Y. M. Liao, Y. H. Wu, P. Y. Chen, J. W. Hsieh, and I. H. Yeh, CSPNet: A new backbone that can enhance learning capability of CNN, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 390–391.
[43]
C. Xiao, d B. Li, J.Y. Zhu, W. He, M. Liu, and D. Song, Generating adversarial examples with adversarial networks, arXiv preprint arXiv: 1801.02610, 2018.
[44]
P. Isola, J.Y. Zhu, T. Zhou, and A.A. Efros, Image-to-image translation with conditional adversarial networks, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 1125–1134.
[45]
P. Zhu, Y. Sun, L. Wen, Y. Feng, and Q. Hu, Drone based RGBT vehicle detection and counting: A challenge, arXiv preprint arXiv: 2003.02437, 2020.
[46]
D. Du, P. Zhu, L. Wen, X. Bian, H. Lin, Q. Hu, T. Peng, J. Zheng, X. Wang, Y. Zhang, et al., VisDrone-DET2019: The vision meets drone object detection in image challenge results, in Proc. IEEE Int. Conf. on Computer Vision Workshops, Seoul, Repubilc of South Korea, 2019, pp. 213–226.
[47]
J. Redmon and A. Farhadi, YOLO9000: Better, faster, stronger, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 7263–7271.
[48]
T.Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, Focal loss for dense object detection, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 2980–2988.
[49]
K. Chen, J. Wang, J. Pang, Y. Cao, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J. Xu, et al., MMDetection: Open mmlab detection toolbox and benchmark, arXiv preprint arXiv: 1906.07155, 2019.
[50]
R.R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, Grad-CAM: Visual explanations from deep networks via gradient-based localization, in Proc. IEEE Int. Conf. on Computer Vision, Venice, Italy, 2017, pp. 618–626.
Tsinghua Science and Technology
Pages 1174-1189
Cite this article:
Xue W, Xia X, Wan P, et al. Adversarial Attack on Object Detection via Object Feature-Wise Attention and Perturbation Extraction. Tsinghua Science and Technology, 2025, 30(3): 1174-1189. https://doi.org/10.26599/TST.2024.9010029

728

Views

204

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 27 July 2023
Revised: 24 December 2023
Accepted: 26 January 2024
Published: 30 December 2024
© The Author(s) 2025.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return