CAAI Artificial Intelligence Research 2023, 2: 9150021 https://doi.org/10.26599/AIR.2023.9150021

Article |

Open Access | Issue | Published: 22 November 2023

CamDiff: Camouflage Image Augmentation via Diffusion

Show Author's Information Hide Author's Information Xue-Jing Luo^¹, Shuo Wang^{¹^,²}, Zongwei Wu^¹, Christos Sakaridis^¹, Yun Cheng^¹, Deng-Ping Fan^¹(

), Luc Van Gool^¹

1Department of Information Technology and Electrical Engineering, ETH Zurich, Zurich 8092, Switzerland

2School of Systems Science, Beijing Normal University, Beijing 100875, China

Keywords:

salient object detection, AI-generated content, diffusion model, camouflaged object detection

Cite this article:

Luo X-J, Wang S, Wu Z, et al. CamDiff: Camouflage Image Augmentation via Diffusion. CAAI Artificial Intelligence Research, 2023, 2: 9150021. https://doi.org/10.26599/AIR.2023.9150021

Download citation

EndNote(RIS)

BibTeX

807

Views

227

Downloads

Citations

Crossref

N/A

WoS

N/A

Scopus

N/A

CSCD

Abstract Full text About this article

Abstract

The burgeoning field of Camouflaged Object Detection (COD) seeks to identify objects that blend into their surroundings. Despite the impressive performance of recent learning-based models, their robustness is limited, as existing methods may misclassify salient objects as camouflaged ones, despite these contradictory characteristics. This limitation may stem from the lack of multi-pattern training images, leading to reduced robustness against salient objects. To overcome the scarcity of multi-pattern training images, we introduce CamDiff, a novel approach inspired by AI-Generated Content (AIGC). Specifically, we leverage a latent diffusion model to synthesize salient objects in camouflaged scenes, while using the zero-shot image classification ability of the Contrastive Language-Image Pre-training (CLIP) model to prevent synthesis failures and ensure that the synthesized objects align with the input prompt. Consequently, the synthesized image retains its original camouflage label while incorporating salient objects, yielding camouflaged scenes with richer characteristics. The results of user studies show that the salient objects in our synthesized scenes attract the user’s attention more; thus, such samples pose a greater challenge to the existing COD models. Our CamDiff enables flexible editing and effcient large-scale dataset generation at a low cost. It significantly enhances the training and testing phases of COD baselines, granting them robustness across diverse domains. Our newly generated datasets and source code are available at https://github.com/drlxj/CamDiff.

Full text

Abstract

Full text

Outline

About this article

CamDiff: Camouflage Image Augmentation via Diffusion

Show Author's information Hide Author's Information Xue-Jing Luo^¹, Shuo Wang^{¹^,²}, Zongwei Wu^¹, Christos Sakaridis^¹, Yun Cheng^¹, Deng-Ping Fan^¹(

), Luc Van Gool^¹

1Department of Information Technology and Electrical Engineering, ETH Zurich, Zurich 8092, Switzerland

2School of Systems Science, Beijing Normal University, Beijing 100875, China

Abstract

Keywords: salient object detection, AI-generated content, diffusion model, camouflaged object detection

References(59)

[1]

H. K. Chu, W. H. Hsu, N. J. Mitra, D. Cohen-Or, T. T. Wong, and T. Y. Lee, Camouflage images, ACM Trans. Graph., vol. 29, no. 4, pp. 1–8, 2010.

DOI Google Scholar

[2]

D. P. Fan, G. P. Ji, M. M. Cheng, and L. Shao, Concealed object detection, IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 10, pp. 6024–6042, 2022.

DOI Google Scholar

[3]

R. He, Q. Dong, J. Lin, and R. W. H. Lau, Weakly-supervised camouflaged object detection with scribble annotations, Proc. AAAI Conf. Artif. Intell., vol. 37, no. 1, pp. 781–789, 2023.

DOI Google Scholar

[4]

X. Hu, S. Wang, X. Qin, H. Dai, W. Ren, D. Luo, Y. Tai, and L. Shao, High-resolution iterative feedback network for camouflaged object detection, Proc. AAAI Conf. Artif. Intell., vol. 37, no. 1, pp. 881–889, 2023.

DOI Google Scholar

[5]

Z. Huang, H. Dai, T. Z. Xiang, S. Wang, H. X. Chen, J. Qin, and H. Xiong, Feature shrinkage pyramid for camouflaged object detection with transformers, in Proc. 2023 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 2023, pp. 5557–5566.

DOI

[6]

Z. Wu, D. P. Paudel, D. -P. Fan, J. Wang, S. Wang, C. Demonceaux, R. Timofte, and L. Van Gool, Source-free depth for object pop-out, arXiv preprint arXiv: 2212.05370, 2022.

[7]

Y. Zhong, B. Li, L. Tang, S. Kuang, S. Wu, and S. Ding, Detecting camouflaged object in frequency domain, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 4494–4503.

DOI

[8]

H. Ali Qadir, Y. Shin, J. Solhusvik, J. Bergsland, L. Aabakken, and I. Balasingham, Polyp detection and segmentation using mask R-CNN: Does a deeper feature extractor CNN always perform better? in Proc. 2019 13th Int. Symp. on Medical Information and Communication Technology (ISMICT), Oslo, Norway, 2019, pp. 1–6.

DOI

[9]

F. Ucar and D. Korkmaz, COVIDiagnosis-Net: Deep Bayes-SqueezeNet based diagnosis of the coronavirus disease 2019 (COVID-19) from X-ray images, Med. Hypotheses, vol. 140, pp. 109761, 2020.

DOI Google Scholar

[10]

B. Dong, W. Wang, D. P. Fan, J. Li, H. Fu, and L. Shao, Polyp-PVT: Polyp segmentation with pyramid vision transformers, arXiv preprint arXiv: 2108.06932, 2021.

[11]

R. Pérez-de la Fuente, X. Delclòs, E. Peñalver, M. Speranza, J. Wierzchos, C. Ascaso, and M. S. Engel, Early evolution and ecology of camouflage in insects, Proc. Natl. Acad. Sci. U. S. A., vol. 109, no. 52, pp. 21414–21419, 2012.

DOI Google Scholar

[12]

F. Fang, L. Li, Y. Gu, H. Zhu, and J. H. Lim, A novel hybrid approach for crack detection, Pattern Recognit., vol. 107, pp. 107474, 2020.

DOI Google Scholar

[13]

D. P. Fan, G. P. Ji, G. Sun, M. M. Cheng, J. Shen, and L. Shao, Camouflaged object detection, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 2774–2784.

DOI

[14]

Y. Pang, X. Zhao, T. Z. Xiang, L. Zhang, and H. Lu, Zoom in and out: A mixed-scale triplet network for camouflaged object detection, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 2150–2160.

DOI

[15]

A. Li, J. Zhang, Y. Lv, B. Liu, T. Zhang, and Y. Dai, Uncertainty-aware joint salient object and camouflaged object detection, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 10066–10076.

DOI

[16]

H. Mei, G. P. Ji, Z. Wei, X. Yang, X. Wei, and D. P. Fan, Camouflaged object segmentation with distraction mining, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 8768–8777.

DOI

[17]

Y. Sun, G. Chen, T. Zhou, Y. Zhang, and N. Liu, Context-aware cross-level fusion network for camouflaged object detection, in Proc. 30th Int. Joint Conf. Artificial Intelligence (IJCAI-21), virtual, 2021, pp. 1025–1031.

DOI

[18]

Q. Jia, S. Yao, Y. Liu, X. Fan, R. Liu, and Z. Luo, Segment, magnify and reiterate: Detecting camouflaged objects the hard way, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 4703–4712.

DOI

[19]

B. Dai and D. Lin, Contrastive learning for image captioning, in Proc. 31st Conf. Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, pp. 898–907.

[20]

Y. Zhang, H. Jiang, Y. Miura, C. D. Manning, and C. Langlotz, Contrastive learning of medical visual representations from paired images and text, in Proc. 9th Int. Conf. Learning Representations, virtual, 2021.

[21]

M. Kang and J. Park, ContraGAN: contrastive learning for conditional image generation, in Proc. 34th Conf. Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada, 2020.

[22]

T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, A simple framework for contrastive learning of visual representations, arXiv preprint arXiv: 2002.05709, 2020.

[23]

Yonglong Tian, Chen Sun, B. Poole, D. Krishnan, and P. Isola, What makes for good views for contrastive learning? in Proc. 34th Conf. Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada, 2020.

[24]

P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan, Supervised contrastive learning, in Proc. 34th Conf. Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada, 2020.

[25]

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, High-resolution image synthesis with latent diffusion models, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 10674–10685.

DOI

[26]

J. Ho, A. Jain, and P. Abbeel, Denoising diffusion probabilistic models, in Proc. 34th Conf. Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada, 2020.

[27]

Y. Cao, S. Li, Y. Liu, Z. Yan, Y. Dai, P. S. Yu, and L. Sun, A comprehensive survey of AI-generated content (AIGC): A history of generative AI from GAN to ChatGPT, arXiv preprint arXiv: 2303.04226, 2023.

[28]

A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, Hierarchical text-conditional image generation with CLIP latents, arXiv preprint arXiv: 2204.06125, 2022.

[29]

S. Huang, Z. Wang, P. Li, B. Jia, T. Liu, Y. Zhu, W. Liang, and S. C. Zhu, Diffusion-based generation, optimization, and planning in 3D scenes, arXiv preprint arXiv: 2301.06015, 2023.

DOI

[30]

C. H. Lin, H. Y. Lee, W. Menapace, M. Chai, A. Siarohin, M. H. Yang, and S. Tulyakov, InfiniCity: infinite-scale city synthesis, arXiv preprint arXiv: 2301.09637, 2023.

[31]

D. Huynh, J. Kuen, Z. Lin, J. Gu, and E. Elhamifar, Open-vocabulary instance segmentation via robust cross-modal pseudo-labeling, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 7010–7021.

DOI

[32]

Y. Benigmim, S. Roy, S. Essid, V. Kalogeiton, and S. Lathuilière, One-shot unsupervised domain adaptation with personalized diffusion models, arXiv preprint arXiv: 2303.18080, 2023.

DOI

[33]

M. Zhang, X. Guo, L. Pan, Z. Cai, F. Hong, H. Li, L. Yang, and Z. Liu, ReMoDiffuse: retrieval-augmented motion diffusion model, arXiv preprint arXiv: 2304.01116, 2023.

[34]

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, and I. Sutskever, Learning transferable visual models from natural language supervision, in Proc. 38 th Int. Conf. Machine Learning, virtual, 2021, pp. 8748–8763.

[35]

P. Dhariwal and A. Nichol, Diffusion models beat GANs on image synthesis, arXiv preprint arXiv: 2105.05233, 2021.

[36]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial networks, Commun. ACM, vol. 63, no. 11, pp. 139–144, 2020.

DOI Google Scholar

[37]

C. Nash, J. Menick, S. Dieleman, and P. W. Battaglia, Generating images with sparse representations, in Proc. 38 th Int. Conf. Machine Learning, virtual, 2021, pp. 7958–7968.

[38]

T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, Spectral normalization for generative adversarial networks, in Proc. 6th Int. Conf. Learning Representations (ICLR), Vancouver, Canada, 2018.

[39]

C. Meng, Y. Song, J. Song, J. Wu, and S. Ermon, SDEdit: Image synthesis and editing with stochastic differential equations, in Proc. 10th Int. Conf. Learning Representations (ICLR), virtual, 2022.

[40]

B. Poole, A. Jain, J. Barron, and B. Mildenhall, DreamFusion: Text-to-3D using 2D diffusion, in Proc. 11th Int. Conf. Learning Representations (ICLR), Kigali, Rwanda, 2023.

[41]

L. Zhou, Y. Du, and J. Wu, 3D shape generation and completion through point-voxel diffusion, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2022, pp. 5806–5815.

DOI

[42]

S. Luo and W. Hu, Diffusion probabilistic models for 3D point cloud generation, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 2836–2844.

DOI

[43]

B. L. Trippe, J. Yim, D. Tischer, T. Broderick, D. Baker, R. Barzilay, and T. Jaakkola, Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem, in Proc. 11th Int. Conf. Learning Representations (ICLR), Kigali, Rwanda, 2023.

[44]

R. Yang, P. Srivastava, and S. Mandt, Diffusion probabilistic modeling for video generation, arXiv preprint arXiv: 2203.09481, 2022.

[45]

C. Saharia, W. Chan, H. Chang, C. A. Lee, J. Ho, T. Salimans, D. J. Fleet, and M. Norouzi, Palette: image-to-image diffusion models, arXiv preprint arXiv: 2111.05826, 2021.

DOI

[46]

F. Yang, Q. Zhai, X. Li, R. Huang, A. Luo, H. Cheng, and D. P. Fan, Uncertainty-guided transformer reasoning for camouflaged object detection, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2022, pp. 4126–4135.

DOI

[47]

M. Zhuge, X. Lu, Y. Guo, Z. Cai, and S. Chen, CubeNet: X-shape connection for camouflaged object detection, Pattern Recognit., vol. 127, pp. 108644, 2022.

DOI Google Scholar

[48]

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W. Y. Lo et al., Segment anything, arXiv preprint arXiv: 2304.02643, 2023.

[49]

G. P. Ji, D. P. Fan, P. Xu, M. -M. Cheng, B. Zhou, and L. Van Gool, SAM struggles in concealed scenes—Empirical study on “segment anything”, arXiv preprint arXiv: 2304.06022, 2023.

[50]

Q. Zhang, G. Yin, Y. Nie, and W. S. Zheng, Deep camouflage images, Proc. AAAI Conf. Artif. Intell., vol. 34, no. 7, pp. 12845–12852, 2020.

DOI Google Scholar

[51]

Y. Li, W. Zhai, Y. Cao, and Z. J. Zha, Location-free camouflage generation network, IEEE Trans. Multimedia, pp. 1–14, 2022.

DOI Google Scholar

[52]

T. N. Le, T. V. Nguyen, Z. Nie, M. T. Tran, and A. Sugimoto, Anabranch network for camouflaged object segmentation, Comput. Vis. Image Underst., vol. 184, no. , pp. 45–56, 2019.

DOI Google Scholar

[53]

P. Skurowski, H. Abdulameer, J. Błaszczyk, T. Depta, A. Kornacki, and P. Kozieł, Animal camouflage analysis: CHAMELEON database, https://www.polsl.pl/rau6/chameleon-database-animal-camouflage-analysis/, 2018.

[54]

Y. Lv, J. Zhang, Y. Dai, A. Li, B. Liu, N. Barnes, and D. P. Fan, Simultaneously localize, segment and rank the camouflaged objects, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 11586–11596.

DOI

[55]

W. Liu, X. Shen, C. M. Pun, and X. Cun, Explicit visual prompting for low-level structure segmentations, in Proc. 2023 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023, pp. 19434–19445.

DOI

[56]

T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, Improved techniques for training GANs, in Proc. 30th Conf. Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.

[57]

L. Wang, H. Lu, Y. Wang, M. Feng, D. Wang, B. Yin, and X. Ruan, Learning to detect salient objects with image-level supervision, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA, 2017, pp. 3796–3805.

DOI

[58]

J. Shi, Q. Yan, L. Xu, and J. Jia, Hierarchical image saliency detection on extended CSSD, IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 4, pp. 717–729, 2016.

DOI Google Scholar

[59]

C. Xia, J. Li, X. Chen, A. Zheng, and Y. Zhang, What is and what is not a salient object? Learning salient object detector by ensembling linear exemplar regressors, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 4399–4407.

DOI

About this article

Publication history

Rights and permissions

Publication history

Received: 12 April 2023

Revised: 23 May 2023

Accepted: 06 October 2023

Published: 22 November 2023

Issue date: December 2023

Copyright

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).