A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S.Gelly, et al., An image is worth 16×16 words: Transformers for image recognition at scale, in Proc. Int. Conf. on Learning Representations, Virtual, 2021.
C. F. R. Chen, Q. Fan, and R. Panda, CrossViT: Cross-attention multi-scale vision transformer for image classification, in Proc. IEEE Int. Conf. on Computer Vision, Montreal, Canada, 2021, pp. 347–356.
C. J. Holder and M. Shafique, On efficient real-time semantic segmentation: A survey, arXiv preprint arXiv: 2206.08605, 2022.
X. Wang, L. Bo, and F. Li, Adaptive wing loss for robust face alignment via heatmap regression, in Proc. IEEE International Conference on Computer Vision, Seoul, Republic of South Korea, 2019, pp. 6970-6980.
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, Intriguing properties of neural networks, in Proc. Int. Conf. on Learning Representations, Banff, Canada, 2014.
D. Jin, Z. Jin, J.T. Zhou, and P. Szolovits, Is bert really robust? a strong baseline for natural language attack on text classification and entailment, in Proc. AAAI Conf. on Artificial Intelligence, New York, NY, USA, 2020, pp. 8018–8025.
J. Yu, M. Gao, H. Yin, J. Li, C. Gao, and Q. Wang, Generating reliable friends via adversarial training to improve social recommendation, in Proc. IEEE Int. Conf. on Data Mining, Beijing, China, 2019, pp. 768–777.
G. Beigi, A. Mosallanezhad, R. Guo, H. Alvari, A. Nou, and H. Liu, Privacy-aware recommendation with private-attribute protection using adversarial learning, in Proc. ACM Int. Conf. on Web Search and Data Mining, Houston, TX, USA, 2020, pp. 34–42.
I. Goodfellow, J. Shlens, and C. Szegedy, Explaining and harnessing adversarial examples, in Proc. Int. Conf. on Learning Representations, San Diego, CA, USA, 2015, pp. 7–9.
A. Kurakin, I. Goodfellow, and S. Bengio, Adversarial machine learning at scale, arXiv preprint arXiv: 1611.01236, 2016.
Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li, Boosting adversarial attacks with momentum, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 9185–9193.
N. Carlini and D. Wagner, Towards evaluating the robustness of neural networks, in Proc. IEEE Symposium on Security and Privacy, San Jose, CA, USA, 2017, pp. 39–57.
H. Li, X. Xu, X. Zhang, S. Yang, and B. Li, Qeba: Query-efficient boundary-based blackbox attack, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 1221–1230.
T. Maho, T. Furon, and E. Le Merrer, Surfree: A fast surrogate-free black-box attack, in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 10430–10439.
J. Fu, J. Sun, and G. Wang, Boosting black-box adversarial attacks with meta learning, in Proc. Chinese Control Conference, Hefei, China, 2022, pp. 7308–7313.
J. Lu, H. Sibai, and E. Fabry, Adversarial examples that fool detectors, arXiv preprint arXiv: 1712.02494, 2017.
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A.C. Berg, SSD: Single shot multibox detector, in Proc. European Conf. Computer Vision, Amsterdam, the Netherlands, 2016, pp. 21–37.
J. Redmon and A. Farhadi, YOlOv3: An incremental improvement, arXiv preprint arXiv: 1804.02767, 2018.
A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, YOlOv4: Optimal speed and accuracy of object detection, arXiv preprint arXiv: 2004.10934, 2020.
R. Girshick, Fast R-CNN, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 1440–1448.
S. Ren, K. He, R. Girshick, and J. Sun, FasterR-CNN: Towards real-time object detection with region proposal networks, in Proc. Advances in Neural Information Processing Systems, Montreal, Canada, 2015, pp. 91–99.
A. Saha, A. Subramanya, K. Patil, and H. Pirsiavash, Role of spatial context in adversarial robustness for object detection, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 2020, pp. 784–785.
J. Bao, Sparse adversarial attack to object detection, arXiv preprint arXiv: 2012.13692, 2020.
S. Wu, T. Dai, and S.-T. Xia, Dpattack: Diffused patch attacks against universal object detection, arXiv preprint arXiv: 2010.11679, 2020.
A. Shapira, A. Zolfi, L. Demetrio, B. Biggio, and A. Shabtai, Phantom sponges: Exploiting non-maximum suppression to attack deep object detectors, in Proc. IEEE Winter Conf. on Applications of Computer Vision, Waikola, HI, USA, 2023, pp. 4571–4580.
C. Xie, J. Wang, Z. Zhang, Y. Zhou, L. Xie, and A. Yuille, Adversarial examples for semantic segmentation and object detection, in Proc. IEEE Int. Conf. on Computer Vision, Venice, Italy, 2017, pp. 1369–1378.
X. Wei, S. Liang, N. Chen, and X. Cao, Transferable adversarial attacks for image and video object detection, arXiv preprint arXiv: 1811.12641, 2018.
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial nets, in Proc. Advances in Neural Information Processing Systems, Montreal, Canada, 2014.
A. Radford, L. Metz, and S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks, arXiv preprint arXiv: 1511.06434, 2015.
X. Wang, X. He, J. Wang, and K. He, Admix: Enhancing the transferability of adversarial attacks, in Proc. IEEE Int. Conf. on Computer Vision, Montreal, Canada, 2021, pp. 16158–16167.
T.Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, Feature pyramid networks for object detection, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 2117–2125.
K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv: 1409.1556, 2014.
K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770–778.
C. Y. Wang, H. Y. M. Liao, Y. H. Wu, P. Y. Chen, J. W. Hsieh, and I. H. Yeh, CSPNet: A new backbone that can enhance learning capability of CNN, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 390–391.
C. Xiao, d B. Li, J.Y. Zhu, W. He, M. Liu, and D. Song, Generating adversarial examples with adversarial networks, arXiv preprint arXiv: 1801.02610, 2018.
P. Isola, J.Y. Zhu, T. Zhou, and A.A. Efros, Image-to-image translation with conditional adversarial networks, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 1125–1134.
P. Zhu, Y. Sun, L. Wen, Y. Feng, and Q. Hu, Drone based RGBT vehicle detection and counting: A challenge, arXiv preprint arXiv: 2003.02437, 2020.
D. Du, P. Zhu, L. Wen, X. Bian, H. Lin, Q. Hu, T. Peng, J. Zheng, X. Wang, Y. Zhang, et al., VisDrone-DET2019: The vision meets drone object detection in image challenge results, in Proc. IEEE Int. Conf. on Computer Vision Workshops, Seoul, Repubilc of South Korea, 2019, pp. 213–226.
J. Redmon and A. Farhadi, YOLO9000: Better, faster, stronger, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 7263–7271.
T.Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, Focal loss for dense object detection, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 2980–2988.
K. Chen, J. Wang, J. Pang, Y. Cao, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J. Xu, et al., MMDetection: Open mmlab detection toolbox and benchmark, arXiv preprint arXiv: 1906.07155, 2019.
R.R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, Grad-CAM: Visual explanations from deep networks via gradient-based localization, in Proc. IEEE Int. Conf. on Computer Vision, Venice, Italy, 2017, pp. 618–626.