A Review of Disentangled Representation Learning for Remote Sensing Data

Mi Wang; Huiwen Wang; Jing Xiao; Liang Liao

doi:10.26599/AIR.2022.9150012

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (1.6 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Review | Open Access

A Review of Disentangled Representation Learning for Remote Sensing Data

Mi Wang^¹(

), Huiwen Wang^¹, Jing Xiao^², Liang Liao^³

1State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430072, China

2School of Computer Science, Wuhan University, Wuhan 430072, China

3School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore

Show Author Information

Abstract

Representation learning is one of the core problems in machine learning research. The transition of input representations for machine learning algorithms from handcraft features, which dominated in the past, to the potential representations learned through deep neural networks nowadays has led to tremendous improvements in algorithm performance. However, the current representations are usually highly entangled, i.e., all information components of the input data are encoded into the same feature space, thus affecting each other and making it difficult to distinguish. Disentangled representation learning aims to learn a low-dimensional interpretable abstract representation that can identify and isolate different potential variables hidden in the high-dimensional observations. Disentangled representation learning can capture information about a single change factor and control it by the corresponding potential subspace, providing a robust representation for complex changes in the data. In this paper, we first introduce and analyze the current status of research on disentangled representation and its causal mechanisms and summarize three crucial properties of disentangled representation. Then, disentangled representation learning algorithms are classified into four categories and outlined in terms of both mathematical description and applicability. Subsequently, the loss functions and objective evaluation metrics commonly used in existing work on disentangled representation are classified. Finally, the paper summarizes representative applications of disentangled representation learning in the field of remote sensing and discusses its future development.

Keywords

deep learning disentangled representation learning latent representation remote sensing data

References

[1]

Y. Bengio, A. Courville, and P. Vincent, Representation learning: A review and new perspectives, IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1798–1828, 2013.

Crossref Google Scholar

[2]

I. Higgins, D. Amos, D. Pfau, S. Racaniere, L. Matthey, D. Rezende, and A. Lerchner, Towards a definition of disentangled representations, arXiv preprint arXiv: 1812.02230, 2018.

[3]

Y. Sun, Y. Ye, W. Liu, W. P. Gao, Y. L. Fu, and T. Mei, Human mesh recovery from monocular images via a skeleton-disentangled representation, in Proc. 2019 IEEE/CVF Int. Conf. on Computer Vision, Seoul, Republic of Korea, 2019, pp. 5348–5357.

Crossref

[4]

I. Khemakhem, D. P. Kingma, R. P. Monti, and A. Hyvärinen, Variational autoencoders and nonlinear ICA: A unifying framework, in Proc 23^rd Int. Conf. on Artificial Intelligence and Statistics, Palermo, Italy, 2020, pp. 2207–2217.

[5]

S. Reed, K. Sohn, Y. T. Zhang, and H. Lee, Learning to disentangle factors of variation with manifold interaction, in Proc. 31^st Int. Conf. on Machine Learning, Beijing, China, 2014, pp. II-1431–II-1439.

[6]

J. M. Yang, S. Reed, M. H. Yang, and H. Lee, Weakly-supervised disentangling with recurrent transformations for 3D view synthesis, in Proc. 28^th Int. Conf. on Neural Information Processing Systems, Montréal, Canada, 2015, pp. 1099–1107.

[7]

Z. Zhu, P. Luo, X. Wang, and X. Tang, Multi-view perceptron: A deep model for learning face identity and view representations, in Proc. 27^th Int. Conf. on Neural Information Processing Systems, Montréal, Canada, 2014, pp. 217–225.

[8]

L. Liao, R. M. Hu, J. Xiao, and Z. Y. Wang, Artist-Net: Decorating the inferred content with unified style for image inpainting, IEEE Access, vol. 7, pp. 36921–36933, 2019.

Crossref Google Scholar

[9]

Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, Reading digits in natural images with unsupervised feature learning, presented at the Advances in Neural Information Processing Systems (NIPS) Workshop on Deep Learning and Unsupervised Feature Learning, 2011, Granada, Spain.

[10]

B. Lu, J. C. Chen, and R. Chellappa, Unsupervised domain-specific deblurring via disentangled representations, in Proc. 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 10217–10226.

Crossref

[11]

L. Ma, Q. Sun, S. Georgoulis, L. Van Gool, B. Schiele, and M. Fritz, Disentangled person image generation, in Proc. 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 99–108.

Crossref

[12]

Q. Nie, Z. Liu, and Y. Liu, Unsupervised 3D human pose representation with viewpoint and pose disentanglement, in Proc. 16^th European Conf. on Computer Vision, Glasgow, UK, 2020, pp. 102–118.

Crossref

[13]

M. Zwicker, Q. Hu, A. Szabó, T. Portenier, and P. Favaro, Disentangling factors of variation by mixing them, in Proc. 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 3399–3407.

Crossref

[14]

H. Hwang, G. H. Kim, S. Hong, and K. E. Kim, Variational interaction information maximization for cross-domain disentanglement, in Proc. 34^th Int. Conf. on Neural Information Processing Systems, Vancouver, Canada, 2020, p. 1885.

[15]

E. Creager, D. Madras, J. H. Jacobsen, M. A. Weis, K. Swersky, T. Pitassi, and R. S. Zemel, Flexibly fair representation learning by disentanglement, in Proc. 36^th Int. Conf. on Machine Learning, Long Beach, CA, USA, 2019, pp. 1436–1445.

[16]

Y. Deng, J. Yang, D. Chen, F. Wen, and X. Tong, Disentangled and controllable face image generation via 3D imitative-contrastive learning, in Proc. 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 5153–5162.

Crossref

[17]

Z. Ding, Y. Xu, W. Xu, G. Parmar, Y. Yang, M. Welling, and Z. Tu, Guided variational autoencoder for disentanglement learning, in Proc. 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 7917–7926.

Crossref

[18]

D. Jung, J. Lee, J. Yi, and S. Yoon, iCaps: An interpretable classifier via disentangled capsule networks, in Proc. 16^th European Conf. on Computer Vision, Glasgow, UK, 2020, pp. 314–330.

Crossref

[19]

L. Liao, W. Y. Chen, J. Xiao, Z. Wang, C. W. Lin, and S. I. Satoh, Unsupervised foggy scene understanding via self spatial-temporal label diffusion, IEEE Trans. Image Process., vol. 31, pp. 3525–3540, 2022.

Crossref Google Scholar

[20]

E. Denton and V. Birodkar, Unsupervised learning of disentangled representations from video, in Proc. 31^st Int. Conf. on Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 4417–4426.

[21]

I. Gulrajani, K. Kumar, F. Ahmed, A. A. Taïga, F. Visin, D. Vázquez, and A. C. Courville, PixelVAE: A latent variable model for natural images, in Proc. 5^th Int. Conf. on Learning Representations, Toulon, France, 2017.

[22]

Y. N. Hung, I. T. Chiang, Y. A. Chen, and Y. H. Yang, Musical composition style transfer via disentangled timbre representations, in Proc. 28^th Int. Joint Conf. on Artificial Intelligence, Macao, China, 2019, pp. 4697–4703.

Crossref

[23]

X. Li, X. Jin, J. Lin, S. Liu, Y. Wu, T. Yu, W. Zhou, and Z. Chen, Learning disentangled feature representation for hybrid-distorted image restoration, in Proc. 16^th European Conf. on Computer Vision, Glasgow, UK, 2020, pp. 313–329.

Crossref

[24]

J. Xiao, L. Liao, Q. Liu, and R. Hu, CISI-net: Explicit latent content inference and imitated style rendering for image inpainting, in Proc. 33^rd AAAI Conf. on Artificial Intelligence and 31^st Innovative Applications of Artificial Intelligence Conf. and 9^th AAAI Symp. on Educational Advances in Artificial Intelligence, Honolulu, HI, USA, 2019, p. 4.

Crossref

[25]

J. Li, Y. Zhuang, S. Dong, P. Gao, H. Dong, H. Chen, L. Chen, and L. Li, Hierarchical disentangling network for building extraction from very high resolution optical remote sensing imagery, Remote Sens., vol. 14, no. 7, p. 1767, 2022.

Crossref Google Scholar

[26]

X. Xu, C. Deng, M. Yang, and H. Wang, Progressive domain-independent feature decomposition network for zero-shot sketch-based image retrieval, in Proc. 29^th Int. Joint Conf. on Artificial Intelligence, Yokohama, Japan, 2021, p. 137.

Crossref

[27]

Y. C. Liu, Y. Y. Yeh, T. C. Fu, S. D. Wang, W. C. Chiu, and Y. C. F. Wang, Detach and adapt: Learning cross-domain disentangled deep representation, in Proc. 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 8867–8876.

Crossref

[28]

C. Pei, F. Wu, L. Huang, and X. Zhuang, Disentangle domain features for cross-modality cardiac image segmentation, Med. Image Anal., vol. 71, p. 102078, 2021.

Crossref Google Scholar

[29]

R. Niu, X. Sun, Y. Tian, W. Diao, Y. Feng, and K. Fu, Improving semantic segmentation in aerial imagery via graph reasoning and disentangled learning, IEEE Trans. Geosci. Remote Sens., vol. 60, p. 5611918, 2021.

Crossref Google Scholar

[30]

M. Jaritz, T. H. Vu, R. de Charette, E. Wirbel, and P. Pérez, xMUDA: Cross-modal unsupervised domain adaptation for 3D semantic segmentation, in Proc. 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 12602–12611.

Crossref

[31]

J. Bai, S. Kong, and C. Gomes, Disentangled variational autoencoder based multi-label classification with covariance-aware multivariate probit model, in Proc. 29^th Int. Joint Conf. on Artificial Intelligence, Yokohama, Japan, 2021, p. 595.

Crossref

[32]

A. Asokan and J. Anitha, Change detection techniques for remote sensing applications: A survey, Earth Sci. Inform., vol. 12, no. 2, pp. 143–160, 2019.

Crossref Google Scholar

[33]

H. Chen, Y. Zao, L. Liu, S. Chen, and Z. Shi, Semantic decoupled representation learning for remote sensing image change detection, in Proc. 2022 IEEE Int. Geoscience and Remote Sensing Symp., Kuala Lumpur, Malaysia, 2022, pp. 1051–1054.

Crossref

[34]

R. Hamaguchi, K. Sakurada, and R. Nakamura, Rare event detection using disentangled representation learning, in Proc. 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 9319–9327.

Crossref

[35]

Y. Liu, Z. Wang, H. Jin, and I. Wassell, Multi-task adversarial network for disentangled feature learning, in Proc. 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 3743–3751.

Crossref

[36]

C. Eastwood and C. K. I. Williams, A framework for the quantitative evaluation of disentangled representations, in Proc. 6^th Int. Conf. on Learning Representations, Vancouver, Canada, 2018.

[37]

K. Ridgeway and M. C. Mozer, Learning deep disentangled embeddings with the F-statistic loss, in Proc. 32^nd Int. Conf. on Neural Information Processing Systems, Montréal, Canada, 2018, pp. 185–194.

[38]

I. Higgins, L. Matthey, A. Pal, C. P. Burgess, X. Glorot, M. M. Botvinick, S. Mohamed, and A. Lerchner, beta-VAE: Learning basic visual concepts with a constrained variational framework, in Proc. 5^th Int. Conf. on Learning Representations, Toulon, France, 2017.

[39]

A. Kumar, P. Sattigeri, and A. Balakrishnan, Variational inference of disentangled latent concepts from unlabeled observations, in Proc. 6^th Int. Conf. on Learning Representations, Vancouver, Canada, 2018.

[40]

H. Kim and A. Mnih, Disentangling by factorising, in Proc. 35^th Int. Conf. on Machine Learning, Stockholm, Sweden, 2018, pp. 2654–2663.

[41]

D. P. Kingma and M. Welling, Auto-encoding variational Bayes, in Proc. 2^nd Int. Conf. on Learning Representations, Banff, Canada, 2014.

[42]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial networks, Commun. ACM, vol. 63, no. 11, pp. 139–144, 2020.

Crossref Google Scholar

[43]

J. Ma, C. Zhou, P. Cui, H. Yang, and W. Zhu, Learning disentangled representations for recommendation, in Proc. 33^rd Int. Conf. on Neural Information Processing Systems, Vancouver, Canada, 2019, p. 513.

[44]

X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel, InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets, in Proc. 30^th Int. Conf. on Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 2180–2188.

[45]

L. Tran, X. Yin, and X. Liu, Disentangled representation learning GAN for pose-invariant face recognition, in Proc. 2017 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 1283–1292.

Crossref

[46]

C. P. Burgess, I. Higgins, A. Pal, L. Matthey, N. Watters, G. Desjardins, and A. Lerchner, Understanding disentangling in β-VAE, arXiv preprint arXiv: 1804.03599, 2018.

[47]

R. T. Q. Chen, X. Li, R. Grosse, and D. Duvenaud, Isolating sources of disentanglement in VAEs, in Proc. 32^nd Int. Conf. on Neural Information Processing Systems, Montréal, Canada, 2018, pp. 2615–2625.

[48]

F. Locatello, S. Bauer, M. Lucic, G. Rätsch, S. Gelly, B. Schölkopf, and O. Bachem, Challenging common assumptions in the unsupervised learning of disentangled representations, in Proc. 36^th Int. Conf. on Machine Learning, Long Beach, CA, USA, 2019, pp. 4114–4124.

[49]

A. Gabbay, N. Cohen, and Y. Hoshen, An image is worth more than a thousand words: Towards disentanglement in the wild, in Proc. 35^th Advances in Neural Information Processing Systems, 2021, pp. 9216–9228.

[50]

P. Li, Y. Liu, H. Shi, X. Wu, Y. Hu, R. He, and Z. Sun, Dual-Structure disentangling variational generation for data-limited face parsing, in Proc. 28^th ACM Int. Conf. on Multimedia, Seattle, WA, USA, 2020, pp. 556–564.

Crossref

[51]

Y. Zhu, M. R. Min, A. Kadav, and H. P. Graf, S3VAE: Self-supervised sequential VAE for representation disentanglement and data generation, in Proc. 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 6537–6546.

Crossref

[52]

J. Bromley, I. Guyon, Y. LeCun, E. Säckinger, and R. Shah, Signature verification using a “siamese” time delay neural network, in Proc. 6^th Int. Conf. on Neural Information Processing Systems, Denver, CO, USA, 1993, pp. 737–744.

[53]

T. Bepler, E. Zhong, K. Kelley, E. Brignole, and B. Berger, Explicitly disentangling image content from translation and rotation with spatial-VAE, in Proc. 33^rd Int. Conf. on Neural Information Processing Systems, Vancouver, Canada, 2019, p. 1383.

[54]

D. Bouchacourt, R. Tomioka, and S. Nowozin, Multi-level variational autoencoder: Learning disentangled representations from grouped observations, in Proc. 32^nd AAAI Conf. on Artificial Intelligence and 30^th Innovative Applications of Artificial Intelligence Conf. and 8^th AAAI Symp. on Educational Advances in Artificial Intelligence, New Orleans, LA USA, 2018, p. 255.

Crossref

[55]

R. Cai, Z. Li, P. Wei, J. Qiao, K. Zhang, and Z. Hao, Learning disentangled semantic representation for domain adaptation, in Proc. 28^th Int. Joint Conf. on Artificial Intelligence, Macao, China, 2019, pp. 2060–2066.

Crossref

[56]

A. C. Massagué, C. Zhang, Z. Feric, O. Camps, and R. Yu, Learning disentangled representations of video with missing data, in Proc. 34^th Int. Conf. on Neural Information Processing Systems, Vancouver, Canada, 2020, p. 306.

[57]

N. S. Detlefsen and S. Hauberg, Explicit disentanglement of appearance and perspective in generative models, in Proc. 33^rd Int. Conf. on Neural Information Processing Systems, Vancouver, Canada, 2019, p. 92.

[58]

L. Liao, J. Xiao, Z. Wang, C. W. Lin, and S. I. Satoh, Guidance and evaluation: Semantic-aware image inpainting for mixed scenes, in Proc. 16^th European Conf. on Computer Vision, Glasgow, UK, 2020, pp. 683–700.

Crossref

[59]

L. Liao, J. Xiao, Z. Wang, C. W. Lin, and S. I. Satoh, Uncertainty-aware semantic guidance and estimation for image inpainting, IEEE J. Sel. Top. Signal Process., vol. 15, no. 2, pp. 310–323, 2021.

Crossref Google Scholar

[60]

B. Duan, C. Fu, Y. Li, X. Song, and R. He, Cross-spectral face hallucination via disentangling independent factors, in Proc. 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 7927–7935.

Crossref

[61]

Z. H. Jiang, Q. Wu, K. Chen, and J. Zhang, Disentangled representation learning for 3D face shape, in Proc. 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 11949–11958.

Crossref

[62]

L. Liao, J. Xiao, Z. Wang, C. W. Lin, and S. I. Satoh, Image inpainting guided by coherence priors of semantics and textures, in Proc. 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Nashville, TN, USA, 2021, pp. 6535–6544.

Crossref

[63]

Y. Bai, Y. Lou, Y. Dai, J. Liu, Z. Chen, and L. Y. Duan, Disentangled feature learning network for vehicle re-identification, in Proc. 29^th Int. Joint Conf. on Artificial Intelligence, Yokohama, Japan, 2021, pp. 6.

Crossref

[64]

H. Y. Lee, H. Y. Tseng, J. B. Huang, M. Singh, and M. H. Yang, Diverse image-to-image translation via disentangled representations, in Proc. 15^th European Conf. on Computer Vision, Munich, Germany, 2018, pp. 36–52.

Crossref

[65]

F. Xiao, H. Liu, and Y. J. Lee, Identity from here, pose from there: SELF-supervised disentanglement and generation of objects using unlabeled videos, in Proc. 2019 IEEE/CVF Int. Conf. on Computer Vision, Seoul, Republic of Korea, 2019, pp. 7012–7021.

Crossref

[66]

S. Li, B. Hooi, and G. H. Lee, Identifying through flows for recovering latent representations, in Proc. 8^th Int. Conf. on Learning Representations, Addis Ababa, Ethiopia, 2020.

[67]

N. Hadad, L. Wolf, and M. Shahar, A two-step disentanglement method, in Proc. 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 772–780.

Crossref

[68]

F. Liu, R. Zhu, D. Zeng, Q. Zhao, and X. Liu, Disentangling features in 3D face shapes for joint face reconstruction and recognition, in Proc. 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 5216–5225.

Crossref

[69]

W. Nie, T. Karras, A. Garg, S. Debnath, A. Patney, A. B. Patel, and A. Anandkumar, Semi-supervised styleGAN for disentanglement learning, in Proc. 37^th Int. Conf. on Machine Learning, 2020, p. 682.

[70]

G. Yin, B. Liu, L. Sheng, N. Yu, X. Wang, and J. Shao, Semantics disentangling for text-to-image generation, in Proc. 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 2322–2331.

Crossref

[71]

S. Zhao, J. Song, and S. Ermon, Learning hierarchical features from deep generative models, in Proc. 34^th Int. Conf. on Machine Learning, Sydney, Australia, 2017, pp. 4091–4099.

[72]

D. J. Rezende, S. Mohamed, and D. Wierstra, Stochastic backpropagation and approximate inference in deep generative models, in Proc. 31^st Int. Conf. on Machine Learning, Beijing, China, 2014, pp. II-1278–II-1286.

[73]

M. Kim, Y. Wang, P. Sahu, and V. Pavlovic, Bayes-factor-VAE: Hierarchical Bayesian deep auto-encoder models for factor disentanglement, in Proc. 2019 IEEE/CVF Int. Conf. on Computer Vision, Seoul, Republic of Korea, 2019, pp. 2979–2987.

Crossref

[74]

H. Sun, R. Mehta, H. Zhou, Z. Huang, S. Johnson, V. Prabhakaran, and V. Singh, DUAL-GLOW: Conditional flow-based generative model for modality transfer, in Proc. 2019 IEEE/CVF Int. Conf. on Computer Vision, Seoul, Republic of Korea, 2019, pp. 10610–10619.

Crossref

[75]

X. Peng, X. Yu, K. Sohn, D. N. Metaxas, and M. Chandraker, Reconstruction-based disentanglement for pose-invariant face recognition, in Proc. 2017 IEEE Int. Conf. on Computer Vision, Venice, Italy, 2017, pp. 1632–1641.

Crossref

[76]

B. Tong, C. Wang, M. Klinkigt, Y. Kobayashi, and Y. Nonaka, Hierarchical disentanglement of discriminative latent features for zero-shot learning, in Proc. 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 11459–11468.

Crossref

[77]

D. A. Klindt, L. Schott, Y. Sharma, I. Ustyuzhaninov, W. Brendel, M. Bethge, and D. M. Paiton, Towards nonlinear disentanglement in natural data with temporal sparse coding, in Proc. 9^th Int. Conf. on Learning Representations, virtual, 2021.

[78]

D. Kotovenko, A. Sanakoyeu, S. Lang, and B. Ommer, Content and style disentanglement for artistic style transfer, in Proc. 2019 IEEE/CVF Int. Conf. on Computer Vision, Seoul, Republic of Korea, 2019, pp. 4421–4430.

Crossref

[79]

R. Kondo, K. Kawano, S. Koide, and T. Kutsuna, Flow-based image-to-image translation with feature disentanglement, in Proc. 33^rd Int. Conf. on Neural Information Processing Systems, Vancouver, Canada, 2019, p. 375.

[80]

W. Lee, D. Kim, S. Hong, and H. Lee, High-fidelity synthesis with disentangled representation, in Proc. 16^th European Conf. on Computer Vision, Glasgow, UK, 2020, pp. 157–174.

Crossref

[81]

Y. Li, K. K. Singh, U. Ojha, and Y. J. Lee, MixNMatch: Multifactor disentanglement and encoding for conditional image generation, in Proc. 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 8036–8045.

Crossref

[82]

G. Cohen, S. Afshar, J. Tapson, and A. van Schaik, EMNIST: Extending MNIST to handwritten letters, in Proc. 2017 Int. Joint Conf. on Neural Networks (IJCNN), Anchorage, AK, USA, 2017, pp. 2921–2926.

Crossref

[83]

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, Caltech-Ucsd birds-200–2011 dataset, https://www.vision.caltech.edu/datasets/cub_200_2011/, 2011.

[84]

S. Xie, T. Yang, X. Wang, and Y. Lin, Hyper-class augmented and regularized deep learning for fine-grained image classification, in Proc. 2015 IEEE Conf. on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 2645–2654.

Crossref

[85]

Y. Y. Khaw, C. Y. Chee, S. N. Gan, R. Singh, N. N. N. Ghazali, and N. S. Liu, Poly (lactic acid) composite films reinforced with microcrystalline cellulose and keratin from chicken feather fiber in 1-butyl-3-methylimidazolium chloride, J. Appl. Polym. Sci., vol. 136, no. 24, p. 47642, 2019.

Crossref Google Scholar

[86]

R. Wu and S. Lu, LEED: Label-free expression editing via disentanglement, in Proc. 16^th European Conf. on Computer Vision, Glasgow, UK, 2020, pp. 781–798.

Crossref

[87]

Z. Liu, H. Zhang, Z. Chen, Z. Wang, and W. Ouyang, Disentangling and unifying graph convolutions for skeleton-based action recognition, in Proc. 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 140–149.

Crossref

[88]

K. K. Singh, U. Ojha, and Y. J. Lee, FineGAN: Unsupervised hierarchical disentanglement for fine-grained object generation and discovery, in Proc. 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 6483–6492.

Crossref

[89]

H. Chen, F. Chen, and H. He, SSC-GAN: A novel GAN based on the same solution constraints of first-order ODEs, Int. J. Pattern Recognit. Artif. Intell., vol. 35, no. 11, p. 2152018, 2021.

Crossref Google Scholar

[90]

A. Gonzalez-Garcia, J. van de Weijer, and Y. Bengio, Image-to-image translation for cross-domain disentanglement, in Proc. 32^nd Int. Conf. on Neural Information Processing Systems, Montréal, Canada, 2018, pp. 1294–1305.

[91]

F. Yang, H. Yang, J. Fu, H. Lu, and B. Guo, Learning texture transformer network for image super-resolution, in Proc. 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 5790–5799.

Crossref

[92]

D. Minnen, J. Ballé, and G. Toderici, Joint autoregressive and hierarchical priors for learned image compression, in Proc. 32^nd Int. Conf. on Neural Information Processing Systems, Montréal, Canada, 2018, pp. 10794–10803.

[93]

L. Cai, H. Gao, and S. Ji, Multi-stage variational auto-encoders for coarse-to-fine image generation, in Proc. 2019 SIAM Int. Conf. on Data Mining, Calgary, Canada, 2019, pp. 630–638.

Crossref

[94]

J. Lezama, Overcoming the disentanglement vs reconstruction trade-off via Jacobian supervision, in Proc. 7^th Int. Conf. on Learning Representations, New Orleans, LA, USA, 2019.

[95]

E. H. Sanchez, M. Serrurier, and M. Ortner, Learning disentangled representations via mutual information estimation, in Proc. 16^th European Conf. on Computer Vision, Glasgow, UK, 2020, pp. 205–221.

Crossref

[96]

J. Y. Zhu, Z. Zhang, C. Zhang, J. Wu, A. Torralba, J. B. Tenenbaum, and W. T. Freeman, Visual object networks: Image generation with disentangled 3D representation, in Proc. 32^nd Int. Conf. on Neural Information Processing Systems, Montréal, Canada, 2018, pp. 118–129.

[97]

Z. Zhang, L. Tran, X. Yin, Y. Atoum, X. Liu, J. Wan, and N. Wang, Gait recognition via disentangled representation learning, in Proc. 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 4705–4714.

Crossref

[98]

N. Srivastava, E. Mansimov, and R. Salakhutdinov, Unsupervised learning of video representations using LSTMs, in Proc. 32^nd Int. Conf. on Machine Learning, Lille, France, 2015, pp. 843–852.

[99]

X. Peng, Z. Huang, X. Sun, and K. Saenko, Domain agnostic learning with disentangled representations, in Proc. 36^th Int. Conf. on Machine Learning, Long Beach, CA, USA, 2019, pp. 5102–5112.

[100]

N. Pu, W. Chen, Y. Liu, E. M. Bakker, and M. S. Lew, Dual Gaussian-based variational subspace disentanglement for visible-infrared person re-identification, in Proc. 28^th ACM Int. Conf. on Multimedia, Seattle, WA, USA, 2020, pp. 2149–2158.

Crossref

[101]

J. Y. Zhu, T. Park, P. Isola, and A. A. Efros, Unpaired image-to-image translation using cycle-consistent adversarial networks, in Proc. 2017 IEEE Int. Conf. on Computer Vision, Venice, Italy, 2017, pp. 2242–2251.

Crossref

[102]

X. Zhu, C. Xu, and D. Tao, Learning disentangled representations with latent variation predictability, in Proc. 16^th European Conf. on Computer Vision, Glasgow, UK, 2020, pp. 684–700.

Crossref

[103]

F. Locatello, G. Abbati, T. Rainforth, S. Bauer, B. Schölkopf, and O. Bachem, On the fairness of disentangled representations, in Proc. 33^rd Int. Conf. on Neural Information Processing Systems, 2019, p. 1309.

[104]

Y. Zou, X. Yang, Z. Yu, B. V. K. Vijaya Kumar, and J. Kautz, Joint disentangling and adaptation for cross-domain person re-identification, in Proc. 16^th European Conf. on Computer Vision, Glasgow, UK, 2020, pp. 87–104.

Crossref

[105]

L. Liao, J. Xiao, Y. Li, M. Wang, and R. Hu, Learned representation of satellite image series for data compression, Remote Sens., vol. 12, no. 3, p. 497, 2020.

Crossref Google Scholar

[106]

J. Xiao, R. Zhu, R. Hu, M. Wang, Y. Zhu, D. Chen, and D. Li, Towards real-time service from remote sensing: Compression of earth observatory video data via long-term background referencing, Remote Sens., vol. 10, no. 6, p. 876, 2018.

Crossref Google Scholar

[107]

O. Press, T. Galanti, S. Benaim, and L. Wolf, Emerging disentanglement in auto-encoder based unsupervised image content transfer, in Proc. 7^th Int. Conf. on Learning Representations, New Orleans, LA, USA, 2019.

[108]

M. Baktashmotlagh, M. Faraki, T. Drummond, and M. Salzmann, Learning factorized representations for open-set domain adaptation, in Proc. 7^th Int. Conf. on Learning Representations, New Orleans, LA, USA, 2019.

[109]

G. Cheng and J. Han, A survey on object detection in optical remote sensing images, ISPRS J. Photogramm. Remote Sens., vol. 117, pp. 11–28, 2016.

Crossref Google Scholar

[110]

S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, in Proc. 28^th Int. Conf. on Neural Information Processing Systems, Montréal, Canada, 2015, pp. 91–99.

[111]

Y. Chen, W. Li, C. Sakaridis, D. Dai, and L. Van Gool, Domain adaptive faster R-CNN for object detection in the wild, in Proc. 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 3339–3348.

Crossref

[112]

M. Wu, H. Yue, J. Wang, Y. Huang, M. Liu, Y. Jiang, C. Ke, and C. Zeng, Object detection based on RGC mask R-CNN, IET Image Process., vol. 14, no. 8, pp. 1502–1508, 2020.

Crossref Google Scholar

[113]

X. Xie, G. Cheng, J. Wang, X. Yao, and J. Han, Oriented R-CNN for object detection, in Proc. 2021 IEEE/CVF Int. Conf. on Computer Vision, Montréal, Canada, 2021, pp. 3500–3509.

Crossref

[114]

H. Zhu, X. Chen, W. Dai, K. Fu, Q. Ye, and J. Jiao, Orientation robust object detection in aerial images using deep convolutional neural network, in Proc. 2015 IEEE Int. Conf. on Image Processing (ICIP), Quebec City, Canada, 2015, pp. 3735–3739.

Crossref

[115]

M. Chang, T. D. Ullman, A. Torralba, and J. B. Tenenbaum, A compositional object-based approach to learning physical dynamics, in Proc. 5^th Int. Conf. on Learning Representations, Toulon, France, 2017.

[116]

W. Guo, H. Huang, X. Kong, and R. He, Learning disentangled representation for cross-modal retrieval with deep mutual information estimation, in Proc. 27^th ACM Int. Conf. on Multimedia, Nice, France, 2019, pp. 1712–1720.

Crossref

[117]

M. Yin, Z. Yao, Y. Cao, X. Li, Z. Zhang, S. Lin, and H. Hu, Disentangled non-local neural networks, in Proc. 16^th European Conf. on Computer Vision, Glasgow, UK, 2020, pp. 191–207.

Crossref

[118]

T. Wang and Y. Li, Rotation-invariant task-aware spatial disentanglement in rotated ship detection based on the three-stage method, IEEE Trans. Geosci. Remote Sens., vol. 60, p. 5609112, 2021.

Crossref Google Scholar

[119]

A. H. Liu, Y. C. Liu, Y. Y. Yeh, and Y. C. F. Wang, A unified feature disentangler for multi-domain image translation and manipulation, in Proc. 32^nd Int. Conf. on Neural Information Processing Systems, Montréal, Canada, 2018, pp. 2595–2604.

[120]

Z. Li, J. V. Murkute, P. K. Gyawali, and L. Wang, Progressive learning and disentanglement of hierarchical representations, in Proc. 8^thInt. Conf. on Learning Representations, Addis Ababa, Ethiopia, 2020.

[121]

X. Zhu, A normalized-cut alignment model for mapping hierarchical semantic structures onto spoken documents, in Proc. 15^th Conf. on Computational Natural Language Learning, Portland, OR, USA, 2011, pp. 210–218.

[122]

Y. Geng, C. Tao, J. Shen, and Z. Zou, High-resolution remote sensing image semantic segmentation based on semi-supervised full convolution network method, Acta Geodaetica et Cartographica Sinica, vol. 49, no. 4, pp. 499–508, 2020.

Google Scholar

[123]

D. M. Vo and S. W. Lee, Semantic image segmentation using fully convolutional neural networks with multi-scale images and multi-scale dilated convolutions, Multimed. Tools Appl., vol. 77, no. 14, pp. 18689–18707, 2018.

Crossref Google Scholar

[124]

C. Zhang, W. Jiang, and Q. Zhao, Semantic segmentation of aerial imagery via split-attention networks with disentangled nonlocal and edge supervision, Remote Sens., vol. 13, no. 6, p. 1176, 2021.

Crossref Google Scholar

[125]

L. Mou, Y. Hua, and X. X. Zhu, Relation matters: Relational context-aware fully convolutional network for semantic segmentation of high-resolution aerial images, IEEE Trans. Geosci. Remote Sens., vol. 58, no. 11, pp. 7557–7569, 2020.

Crossref Google Scholar

[126]

L. Ding, H. Tang, and L. Bruzzone, LANet: Local attention embedding to improve the semantic segmentation of remote sensing images, IEEE Trans. Geosci. Remote Sens., vol. 59, no. 1, pp. 426–435, 2021.

Crossref Google Scholar

[127]

C. Bass, M. da Silva, C. Sudre, P. D. Tudosiu, S. M. Smith, and E. C. Robinson, ICAM: Interpretable classification via disentangled representations and feature attribution mapping, in Proc. 34^th Int. Conf. on Neural Information Processing Systems, 2020, p. 645.

[128]

J. Yang, N. C. Dvornek, F. Zhang, J. Chapiro, M. Lin, and J. S. Duncan, Unsupervised domain adaptation via disentangled representations: Application to cross-modality liver segmentation, in Proc. 22^nd Int. Conf. on Medical Image Computing and Computer Assisted Intervention, Shenzhen, China, 2019, pp. 255–263.

Crossref

[129]

J. Nie, C. Zheng, C. Wang, Z. Zuo, X. Lv, S. Yu, and Z. Wei, Scale–relation joint decoupling network for remote sensing image semantic segmentation, IEEE Trans. Geosci. Remote Sens., vol. 60, p. 5412812, 2022.

Crossref Google Scholar

[130]

B. Fang, G. Chen, G. Ouyang, J. Chen, R. Kou, and L. Wang, Content-invariant dual learning for change detection in remote sensing images, IEEE Trans. Geosci. Remote Sens., vol. 60, p. 5603317, 2021.

Crossref Google Scholar

[131]

Y. LeCun, F. J. Huang, and L. Bottou, Learning methods for generic object recognition with invariance to pose and lighting, in Proc. 2004 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, Washington, DC, USA, 2004, pp. 97–104.

[132]

N. Aifanti, C. Papachristou, and A. Delopoulos, The MUG facial expression database, in Proc. 11th Int. Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10, Desenzano del Garda, Italy, 2010, pp. 1–4.

[133]

H. Xiao, K. Rasul, and R. Vollgraf, Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms, arXiv preprint arXiv: 1708.07747, 2017.

[134]

M. Aubry, D. Maturana, A. A. Efros, B. C. Russell, and J. Sivic, Seeing 3D chairs: Exemplar part-based 2D-3D alignment using a large dataset of CAD models, in Proc. 2014 IEEE Conf. on Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp. 3762–3769.

Crossref

CAAI Artificial Intelligence Research

Volume 1 Issue 2,
December 2022

Pages 172-190

DOI: 10.26599/AIR.2022.9150012

Cite this article:

Wang M, Wang H, Xiao J, et al. A Review of Disentangled Representation Learning for Remote Sensing Data. CAAI Artificial Intelligence Research, 2022, 1(2): 172-190. https://doi.org/10.26599/AIR.2022.9150012

3074

Views

560

Downloads

Crossref

Google Scholar
Citation

Altmetrics

Received: 30 September 2022

Revised: 12 December 2022

Accepted: 17 January 2023

Published: 10 March 2023

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).