Journal Home > Volume 2

The burgeoning field of Camouflaged Object Detection (COD) seeks to identify objects that blend into their surroundings. Despite the impressive performance of recent learning-based models, their robustness is limited, as existing methods may misclassify salient objects as camouflaged ones, despite these contradictory characteristics. This limitation may stem from the lack of multi-pattern training images, leading to reduced robustness against salient objects. To overcome the scarcity of multi-pattern training images, we introduce CamDiff, a novel approach inspired by AI-Generated Content (AIGC). Specifically, we leverage a latent diffusion model to synthesize salient objects in camouflaged scenes, while using the zero-shot image classification ability of the Contrastive Language-Image Pre-training (CLIP) model to prevent synthesis failures and ensure that the synthesized objects align with the input prompt. Consequently, the synthesized image retains its original camouflage label while incorporating salient objects, yielding camouflaged scenes with richer characteristics. The results of user studies show that the salient objects in our synthesized scenes attract the user’s attention more; thus, such samples pose a greater challenge to the existing COD models. Our CamDiff enables flexible editing and effcient large-scale dataset generation at a low cost. It significantly enhances the training and testing phases of COD baselines, granting them robustness across diverse domains. Our newly generated datasets and source code are available at https://github.com/drlxj/CamDiff.


menu
Abstract
Full text
Outline
About this article

CamDiff: Camouflage Image Augmentation via Diffusion

Show Author's information Xue-Jing Luo1Shuo Wang1,2Zongwei Wu1Christos Sakaridis1Yun Cheng1Deng-Ping Fan1( )Luc Van Gool1
Department of Information Technology and Electrical Engineering, ETH Zurich, Zurich 8092, Switzerland
School of Systems Science, Beijing Normal University, Beijing 100875, China

Abstract

The burgeoning field of Camouflaged Object Detection (COD) seeks to identify objects that blend into their surroundings. Despite the impressive performance of recent learning-based models, their robustness is limited, as existing methods may misclassify salient objects as camouflaged ones, despite these contradictory characteristics. This limitation may stem from the lack of multi-pattern training images, leading to reduced robustness against salient objects. To overcome the scarcity of multi-pattern training images, we introduce CamDiff, a novel approach inspired by AI-Generated Content (AIGC). Specifically, we leverage a latent diffusion model to synthesize salient objects in camouflaged scenes, while using the zero-shot image classification ability of the Contrastive Language-Image Pre-training (CLIP) model to prevent synthesis failures and ensure that the synthesized objects align with the input prompt. Consequently, the synthesized image retains its original camouflage label while incorporating salient objects, yielding camouflaged scenes with richer characteristics. The results of user studies show that the salient objects in our synthesized scenes attract the user’s attention more; thus, such samples pose a greater challenge to the existing COD models. Our CamDiff enables flexible editing and effcient large-scale dataset generation at a low cost. It significantly enhances the training and testing phases of COD baselines, granting them robustness across diverse domains. Our newly generated datasets and source code are available at https://github.com/drlxj/CamDiff.

Keywords: salient object detection, AI-generated content, diffusion model, camouflaged object detection

References(59)

[1]

H. K. Chu, W. H. Hsu, N. J. Mitra, D. Cohen-Or, T. T. Wong, and T. Y. Lee, Camouflage images, ACM Trans. Graph., vol. 29, no. 4, pp. 1–8, 2010.

[2]

D. P. Fan, G. P. Ji, M. M. Cheng, and L. Shao, Concealed object detection, IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 10, pp. 6024–6042, 2022.

[3]

R. He, Q. Dong, J. Lin, and R. W. H. Lau, Weakly-supervised camouflaged object detection with scribble annotations, Proc. AAAI Conf. Artif. Intell., vol. 37, no. 1, pp. 781–789, 2023.

[4]

X. Hu, S. Wang, X. Qin, H. Dai, W. Ren, D. Luo, Y. Tai, and L. Shao, High-resolution iterative feedback network for camouflaged object detection, Proc. AAAI Conf. Artif. Intell., vol. 37, no. 1, pp. 881–889, 2023.

[5]
Z. Huang, H. Dai, T. Z. Xiang, S. Wang, H. X. Chen, J. Qin, and H. Xiong, Feature shrinkage pyramid for camouflaged object detection with transformers, in Proc. 2023 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 2023, pp. 5557–5566.
DOI
[6]
Z. Wu, D. P. Paudel, D. -P. Fan, J. Wang, S. Wang, C. Demonceaux, R. Timofte, and L. Van Gool, Source-free depth for object pop-out, arXiv preprint arXiv: 2212.05370, 2022.
[7]
Y. Zhong, B. Li, L. Tang, S. Kuang, S. Wu, and S. Ding, Detecting camouflaged object in frequency domain, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 4494–4503.
DOI
[8]
H. Ali Qadir, Y. Shin, J. Solhusvik, J. Bergsland, L. Aabakken, and I. Balasingham, Polyp detection and segmentation using mask R-CNN: Does a deeper feature extractor CNN always perform better? in Proc. 2019 13th Int. Symp. on Medical Information and Communication Technology (ISMICT), Oslo, Norway, 2019, pp. 1–6.
DOI
[9]

F. Ucar and D. Korkmaz, COVIDiagnosis-Net: Deep Bayes-SqueezeNet based diagnosis of the coronavirus disease 2019 (COVID-19) from X-ray images, Med. Hypotheses, vol. 140, pp. 109761, 2020.

[10]
B. Dong, W. Wang, D. P. Fan, J. Li, H. Fu, and L. Shao, Polyp-PVT: Polyp segmentation with pyramid vision transformers, arXiv preprint arXiv: 2108.06932, 2021.
[11]

R. Pérez-de la Fuente, X. Delclòs, E. Peñalver, M. Speranza, J. Wierzchos, C. Ascaso, and M. S. Engel, Early evolution and ecology of camouflage in insects, Proc. Natl. Acad. Sci. U. S. A., vol. 109, no. 52, pp. 21414–21419, 2012.

[12]

F. Fang, L. Li, Y. Gu, H. Zhu, and J. H. Lim, A novel hybrid approach for crack detection, Pattern Recognit., vol. 107, pp. 107474, 2020.

[13]
D. P. Fan, G. P. Ji, G. Sun, M. M. Cheng, J. Shen, and L. Shao, Camouflaged object detection, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 2774–2784.
DOI
[14]
Y. Pang, X. Zhao, T. Z. Xiang, L. Zhang, and H. Lu, Zoom in and out: A mixed-scale triplet network for camouflaged object detection, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 2150–2160.
DOI
[15]
A. Li, J. Zhang, Y. Lv, B. Liu, T. Zhang, and Y. Dai, Uncertainty-aware joint salient object and camouflaged object detection, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 10066–10076.
DOI
[16]
H. Mei, G. P. Ji, Z. Wei, X. Yang, X. Wei, and D. P. Fan, Camouflaged object segmentation with distraction mining, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 8768–8777.
DOI
[17]
Y. Sun, G. Chen, T. Zhou, Y. Zhang, and N. Liu, Context-aware cross-level fusion network for camouflaged object detection, in Proc. 30th Int. Joint Conf. Artificial Intelligence (IJCAI-21), virtual, 2021, pp. 1025–1031.
DOI
[18]
Q. Jia, S. Yao, Y. Liu, X. Fan, R. Liu, and Z. Luo, Segment, magnify and reiterate: Detecting camouflaged objects the hard way, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 4703–4712.
DOI
[19]
B. Dai and D. Lin, Contrastive learning for image captioning, in Proc. 31st Conf. Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, pp. 898–907.
[20]
Y. Zhang, H. Jiang, Y. Miura, C. D. Manning, and C. Langlotz, Contrastive learning of medical visual representations from paired images and text, in Proc. 9th Int. Conf. Learning Representations, virtual, 2021.
[21]
M. Kang and J. Park, ContraGAN: contrastive learning for conditional image generation, in Proc. 34th Conf. Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada, 2020.
[22]
T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, A simple framework for contrastive learning of visual representations, arXiv preprint arXiv: 2002.05709, 2020.
[23]
Yonglong Tian, Chen Sun, B. Poole, D. Krishnan, and P. Isola, What makes for good views for contrastive learning? in Proc. 34th Conf. Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada, 2020.
[24]
P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan, Supervised contrastive learning, in Proc. 34th Conf. Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada, 2020.
[25]
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, High-resolution image synthesis with latent diffusion models, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 10674–10685.
DOI
[26]
J. Ho, A. Jain, and P. Abbeel, Denoising diffusion probabilistic models, in Proc. 34th Conf. Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada, 2020.
[27]
Y. Cao, S. Li, Y. Liu, Z. Yan, Y. Dai, P. S. Yu, and L. Sun, A comprehensive survey of AI-generated content (AIGC): A history of generative AI from GAN to ChatGPT, arXiv preprint arXiv: 2303.04226, 2023.
[28]
A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, Hierarchical text-conditional image generation with CLIP latents, arXiv preprint arXiv: 2204.06125, 2022.
[29]
S. Huang, Z. Wang, P. Li, B. Jia, T. Liu, Y. Zhu, W. Liang, and S. C. Zhu, Diffusion-based generation, optimization, and planning in 3D scenes, arXiv preprint arXiv: 2301.06015, 2023.
DOI
[30]
C. H. Lin, H. Y. Lee, W. Menapace, M. Chai, A. Siarohin, M. H. Yang, and S. Tulyakov, InfiniCity: infinite-scale city synthesis, arXiv preprint arXiv: 2301.09637, 2023.
[31]
D. Huynh, J. Kuen, Z. Lin, J. Gu, and E. Elhamifar, Open-vocabulary instance segmentation via robust cross-modal pseudo-labeling, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 7010–7021.
DOI
[32]
Y. Benigmim, S. Roy, S. Essid, V. Kalogeiton, and S. Lathuilière, One-shot unsupervised domain adaptation with personalized diffusion models, arXiv preprint arXiv: 2303.18080, 2023.
DOI
[33]
M. Zhang, X. Guo, L. Pan, Z. Cai, F. Hong, H. Li, L. Yang, and Z. Liu, ReMoDiffuse: retrieval-augmented motion diffusion model, arXiv preprint arXiv: 2304.01116, 2023.
[34]
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, and I. Sutskever, Learning transferable visual models from natural language supervision, in Proc. 38 th Int. Conf. Machine Learning, virtual, 2021, pp. 8748–8763.
[35]
P. Dhariwal and A. Nichol, Diffusion models beat GANs on image synthesis, arXiv preprint arXiv: 2105.05233, 2021.
[36]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial networks, Commun. ACM, vol. 63, no. 11, pp. 139–144, 2020.

[37]
C. Nash, J. Menick, S. Dieleman, and P. W. Battaglia, Generating images with sparse representations, in Proc. 38 th Int. Conf. Machine Learning, virtual, 2021, pp. 7958–7968.
[38]
T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, Spectral normalization for generative adversarial networks, in Proc. 6th Int. Conf. Learning Representations (ICLR), Vancouver, Canada, 2018.
[39]
C. Meng, Y. Song, J. Song, J. Wu, and S. Ermon, SDEdit: Image synthesis and editing with stochastic differential equations, in Proc. 10th Int. Conf. Learning Representations (ICLR), virtual, 2022.
[40]
B. Poole, A. Jain, J. Barron, and B. Mildenhall, DreamFusion: Text-to-3D using 2D diffusion, in Proc. 11th Int. Conf. Learning Representations (ICLR), Kigali, Rwanda, 2023.
[41]
L. Zhou, Y. Du, and J. Wu, 3D shape generation and completion through point-voxel diffusion, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2022, pp. 5806–5815.
DOI
[42]
S. Luo and W. Hu, Diffusion probabilistic models for 3D point cloud generation, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 2836–2844.
DOI
[43]
B. L. Trippe, J. Yim, D. Tischer, T. Broderick, D. Baker, R. Barzilay, and T. Jaakkola, Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem, in Proc. 11th Int. Conf. Learning Representations (ICLR), Kigali, Rwanda, 2023.
[44]
R. Yang, P. Srivastava, and S. Mandt, Diffusion probabilistic modeling for video generation, arXiv preprint arXiv: 2203.09481, 2022.
[45]
C. Saharia, W. Chan, H. Chang, C. A. Lee, J. Ho, T. Salimans, D. J. Fleet, and M. Norouzi, Palette: image-to-image diffusion models, arXiv preprint arXiv: 2111.05826, 2021.
DOI
[46]
F. Yang, Q. Zhai, X. Li, R. Huang, A. Luo, H. Cheng, and D. P. Fan, Uncertainty-guided transformer reasoning for camouflaged object detection, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2022, pp. 4126–4135.
DOI
[47]

M. Zhuge, X. Lu, Y. Guo, Z. Cai, and S. Chen, CubeNet: X-shape connection for camouflaged object detection, Pattern Recognit., vol. 127, pp. 108644, 2022.

[48]
A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W. Y. Lo et al., Segment anything, arXiv preprint arXiv: 2304.02643, 2023.
[49]
G. P. Ji, D. P. Fan, P. Xu, M. -M. Cheng, B. Zhou, and L. Van Gool, SAM struggles in concealed scenes—Empirical study on “segment anything”, arXiv preprint arXiv: 2304.06022, 2023.
[50]

Q. Zhang, G. Yin, Y. Nie, and W. S. Zheng, Deep camouflage images, Proc. AAAI Conf. Artif. Intell., vol. 34, no. 7, pp. 12845–12852, 2020.

[51]

Y. Li, W. Zhai, Y. Cao, and Z. J. Zha, Location-free camouflage generation network, IEEE Trans. Multimedia, pp. 1–14, 2022.

[52]

T. N. Le, T. V. Nguyen, Z. Nie, M. T. Tran, and A. Sugimoto, Anabranch network for camouflaged object segmentation, Comput. Vis. Image Underst., vol. 184, no. , pp. 45–56, 2019.

[53]
P. Skurowski, H. Abdulameer, J. Błaszczyk, T. Depta, A. Kornacki, and P. Kozieł, Animal camouflage analysis: CHAMELEON database, https://www.polsl.pl/rau6/chameleon-database-animal-camouflage-analysis/, 2018.
[54]
Y. Lv, J. Zhang, Y. Dai, A. Li, B. Liu, N. Barnes, and D. P. Fan, Simultaneously localize, segment and rank the camouflaged objects, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 11586–11596.
DOI
[55]
W. Liu, X. Shen, C. M. Pun, and X. Cun, Explicit visual prompting for low-level structure segmentations, in Proc. 2023 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023, pp. 19434–19445.
DOI
[56]
T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, Improved techniques for training GANs, in Proc. 30th Conf. Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.
[57]
L. Wang, H. Lu, Y. Wang, M. Feng, D. Wang, B. Yin, and X. Ruan, Learning to detect salient objects with image-level supervision, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA, 2017, pp. 3796–3805.
DOI
[58]

J. Shi, Q. Yan, L. Xu, and J. Jia, Hierarchical image saliency detection on extended CSSD, IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 4, pp. 717–729, 2016.

[59]
C. Xia, J. Li, X. Chen, A. Zheng, and Y. Zhang, What is and what is not a salient object? Learning salient object detector by ensembling linear exemplar regressors, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 4399–4407.
DOI
Publication history
Copyright
Rights and permissions

Publication history

Received: 12 April 2023
Revised: 23 May 2023
Accepted: 06 October 2023
Published: 22 November 2023
Issue date: December 2023

Copyright

© The author(s) 2023.

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return