Journal Home > Volume 1 , Issue 2

Representation learning is one of the core problems in machine learning research. The transition of input representations for machine learning algorithms from handcraft features, which dominated in the past, to the potential representations learned through deep neural networks nowadays has led to tremendous improvements in algorithm performance. However, the current representations are usually highly entangled, i.e., all information components of the input data are encoded into the same feature space, thus affecting each other and making it difficult to distinguish. Disentangled representation learning aims to learn a low-dimensional interpretable abstract representation that can identify and isolate different potential variables hidden in the high-dimensional observations. Disentangled representation learning can capture information about a single change factor and control it by the corresponding potential subspace, providing a robust representation for complex changes in the data. In this paper, we first introduce and analyze the current status of research on disentangled representation and its causal mechanisms and summarize three crucial properties of disentangled representation. Then, disentangled representation learning algorithms are classified into four categories and outlined in terms of both mathematical description and applicability. Subsequently, the loss functions and objective evaluation metrics commonly used in existing work on disentangled representation are classified. Finally, the paper summarizes representative applications of disentangled representation learning in the field of remote sensing and discusses its future development.


menu
Abstract
Full text
Outline
About this article

A Review of Disentangled Representation Learning for Remote Sensing Data

Show Author's information Mi Wang1( )Huiwen Wang1Jing Xiao2Liang Liao3
State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430072, China
School of Computer Science, Wuhan University, Wuhan 430072, China
School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore

Abstract

Representation learning is one of the core problems in machine learning research. The transition of input representations for machine learning algorithms from handcraft features, which dominated in the past, to the potential representations learned through deep neural networks nowadays has led to tremendous improvements in algorithm performance. However, the current representations are usually highly entangled, i.e., all information components of the input data are encoded into the same feature space, thus affecting each other and making it difficult to distinguish. Disentangled representation learning aims to learn a low-dimensional interpretable abstract representation that can identify and isolate different potential variables hidden in the high-dimensional observations. Disentangled representation learning can capture information about a single change factor and control it by the corresponding potential subspace, providing a robust representation for complex changes in the data. In this paper, we first introduce and analyze the current status of research on disentangled representation and its causal mechanisms and summarize three crucial properties of disentangled representation. Then, disentangled representation learning algorithms are classified into four categories and outlined in terms of both mathematical description and applicability. Subsequently, the loss functions and objective evaluation metrics commonly used in existing work on disentangled representation are classified. Finally, the paper summarizes representative applications of disentangled representation learning in the field of remote sensing and discusses its future development.

Keywords: deep learning, disentangled representation learning, latent representation, remote sensing data

References(134)

[1]

Y. Bengio, A. Courville, and P. Vincent, Representation learning: A review and new perspectives, IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1798–1828, 2013.

[2]
I. Higgins, D. Amos, D. Pfau, S. Racaniere, L. Matthey, D. Rezende, and A. Lerchner, Towards a definition of disentangled representations, arXiv preprint arXiv: 1812.02230, 2018.
[3]
Y. Sun, Y. Ye, W. Liu, W. P. Gao, Y. L. Fu, and T. Mei, Human mesh recovery from monocular images via a skeleton-disentangled representation, in Proc. 2019 IEEE/CVF Int. Conf. on Computer Vision, Seoul, Republic of Korea, 2019, pp. 5348–5357.
DOI
[4]
I. Khemakhem, D. P. Kingma, R. P. Monti, and A. Hyvärinen, Variational autoencoders and nonlinear ICA: A unifying framework, in Proc 23rd Int. Conf. on Artificial Intelligence and Statistics, Palermo, Italy, 2020, pp. 2207–2217.
[5]
S. Reed, K. Sohn, Y. T. Zhang, and H. Lee, Learning to disentangle factors of variation with manifold interaction, in Proc. 31st Int. Conf. on Machine Learning, Beijing, China, 2014, pp. II-1431–II-1439.
[6]
J. M. Yang, S. Reed, M. H. Yang, and H. Lee, Weakly-supervised disentangling with recurrent transformations for 3D view synthesis, in Proc. 28th Int. Conf. on Neural Information Processing Systems, Montréal, Canada, 2015, pp. 1099–1107.
[7]
Z. Zhu, P. Luo, X. Wang, and X. Tang, Multi-view perceptron: A deep model for learning face identity and view representations, in Proc. 27th Int. Conf. on Neural Information Processing Systems, Montréal, Canada, 2014, pp. 217–225.
[8]

L. Liao, R. M. Hu, J. Xiao, and Z. Y. Wang, Artist-Net: Decorating the inferred content with unified style for image inpainting, IEEE Access, vol. 7, pp. 36921–36933, 2019.

[9]
Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, Reading digits in natural images with unsupervised feature learning, presented at the Advances in Neural Information Processing Systems (NIPS) Workshop on Deep Learning and Unsupervised Feature Learning, 2011, Granada, Spain.
[10]
B. Lu, J. C. Chen, and R. Chellappa, Unsupervised domain-specific deblurring via disentangled representations, in Proc. 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 10217–10226.
DOI
[11]
L. Ma, Q. Sun, S. Georgoulis, L. Van Gool, B. Schiele, and M. Fritz, Disentangled person image generation, in Proc. 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 99–108.
DOI
[12]
Q. Nie, Z. Liu, and Y. Liu, Unsupervised 3D human pose representation with viewpoint and pose disentanglement, in Proc. 16th European Conf. on Computer Vision, Glasgow, UK, 2020, pp. 102–118.
DOI
[13]
M. Zwicker, Q. Hu, A. Szabó, T. Portenier, and P. Favaro, Disentangling factors of variation by mixing them, in Proc. 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 3399–3407.
DOI
[14]
H. Hwang, G. H. Kim, S. Hong, and K. E. Kim, Variational interaction information maximization for cross-domain disentanglement, in Proc. 34th Int. Conf. on Neural Information Processing Systems, Vancouver, Canada, 2020, p. 1885.
[15]
E. Creager, D. Madras, J. H. Jacobsen, M. A. Weis, K. Swersky, T. Pitassi, and R. S. Zemel, Flexibly fair representation learning by disentanglement, in Proc. 36th Int. Conf. on Machine Learning, Long Beach, CA, USA, 2019, pp. 1436–1445.
[16]
Y. Deng, J. Yang, D. Chen, F. Wen, and X. Tong, Disentangled and controllable face image generation via 3D imitative-contrastive learning, in Proc. 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 5153–5162.
DOI
[17]
Z. Ding, Y. Xu, W. Xu, G. Parmar, Y. Yang, M. Welling, and Z. Tu, Guided variational autoencoder for disentanglement learning, in Proc. 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 7917–7926.
DOI
[18]
D. Jung, J. Lee, J. Yi, and S. Yoon, iCaps: An interpretable classifier via disentangled capsule networks, in Proc. 16th European Conf. on Computer Vision, Glasgow, UK, 2020, pp. 314–330.
DOI
[19]

L. Liao, W. Y. Chen, J. Xiao, Z. Wang, C. W. Lin, and S. I. Satoh, Unsupervised foggy scene understanding via self spatial-temporal label diffusion, IEEE Trans. Image Process., vol. 31, pp. 3525–3540, 2022.

[20]
E. Denton and V. Birodkar, Unsupervised learning of disentangled representations from video, in Proc. 31st Int. Conf. on Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 4417–4426.
[21]
I. Gulrajani, K. Kumar, F. Ahmed, A. A. Taïga, F. Visin, D. Vázquez, and A. C. Courville, PixelVAE: A latent variable model for natural images, in Proc. 5th Int. Conf. on Learning Representations, Toulon, France, 2017.
[22]
Y. N. Hung, I. T. Chiang, Y. A. Chen, and Y. H. Yang, Musical composition style transfer via disentangled timbre representations, in Proc. 28th Int. Joint Conf. on Artificial Intelligence, Macao, China, 2019, pp. 4697–4703.
DOI
[23]
X. Li, X. Jin, J. Lin, S. Liu, Y. Wu, T. Yu, W. Zhou, and Z. Chen, Learning disentangled feature representation for hybrid-distorted image restoration, in Proc. 16th European Conf. on Computer Vision, Glasgow, UK, 2020, pp. 313–329.
DOI
[24]
J. Xiao, L. Liao, Q. Liu, and R. Hu, CISI-net: Explicit latent content inference and imitated style rendering for image inpainting, in Proc. 33rd AAAI Conf. on Artificial Intelligence and 31st Innovative Applications of Artificial Intelligence Conf. and 9th AAAI Symp. on Educational Advances in Artificial Intelligence, Honolulu, HI, USA, 2019, p. 4.
DOI
[25]

J. Li, Y. Zhuang, S. Dong, P. Gao, H. Dong, H. Chen, L. Chen, and L. Li, Hierarchical disentangling network for building extraction from very high resolution optical remote sensing imagery, Remote Sens., vol. 14, no. 7, p. 1767, 2022.

[26]
X. Xu, C. Deng, M. Yang, and H. Wang, Progressive domain-independent feature decomposition network for zero-shot sketch-based image retrieval, in Proc. 29th Int. Joint Conf. on Artificial Intelligence, Yokohama, Japan, 2021, p. 137.
DOI
[27]
Y. C. Liu, Y. Y. Yeh, T. C. Fu, S. D. Wang, W. C. Chiu, and Y. C. F. Wang, Detach and adapt: Learning cross-domain disentangled deep representation, in Proc. 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 8867–8876.
DOI
[28]

C. Pei, F. Wu, L. Huang, and X. Zhuang, Disentangle domain features for cross-modality cardiac image segmentation, Med. Image Anal., vol. 71, p. 102078, 2021.

[29]

R. Niu, X. Sun, Y. Tian, W. Diao, Y. Feng, and K. Fu, Improving semantic segmentation in aerial imagery via graph reasoning and disentangled learning, IEEE Trans. Geosci. Remote Sens., vol. 60, p. 5611918, 2021.

[30]
M. Jaritz, T. H. Vu, R. de Charette, E. Wirbel, and P. Pérez, xMUDA: Cross-modal unsupervised domain adaptation for 3D semantic segmentation, in Proc. 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 12602–12611.
DOI
[31]
J. Bai, S. Kong, and C. Gomes, Disentangled variational autoencoder based multi-label classification with covariance-aware multivariate probit model, in Proc. 29th Int. Joint Conf. on Artificial Intelligence, Yokohama, Japan, 2021, p. 595.
DOI
[32]

A. Asokan and J. Anitha, Change detection techniques for remote sensing applications: A survey, Earth Sci. Inform., vol. 12, no. 2, pp. 143–160, 2019.

[33]
H. Chen, Y. Zao, L. Liu, S. Chen, and Z. Shi, Semantic decoupled representation learning for remote sensing image change detection, in Proc. 2022 IEEE Int. Geoscience and Remote Sensing Symp., Kuala Lumpur, Malaysia, 2022, pp. 1051–1054.
DOI
[34]
R. Hamaguchi, K. Sakurada, and R. Nakamura, Rare event detection using disentangled representation learning, in Proc. 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 9319–9327.
DOI
[35]
Y. Liu, Z. Wang, H. Jin, and I. Wassell, Multi-task adversarial network for disentangled feature learning, in Proc. 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 3743–3751.
DOI
[36]
C. Eastwood and C. K. I. Williams, A framework for the quantitative evaluation of disentangled representations, in Proc. 6th Int. Conf. on Learning Representations, Vancouver, Canada, 2018.
[37]
K. Ridgeway and M. C. Mozer, Learning deep disentangled embeddings with the F-statistic loss, in Proc. 32nd Int. Conf. on Neural Information Processing Systems, Montréal, Canada, 2018, pp. 185–194.
[38]
I. Higgins, L. Matthey, A. Pal, C. P. Burgess, X. Glorot, M. M. Botvinick, S. Mohamed, and A. Lerchner, beta-VAE: Learning basic visual concepts with a constrained variational framework, in Proc. 5th Int. Conf. on Learning Representations, Toulon, France, 2017.
[39]
A. Kumar, P. Sattigeri, and A. Balakrishnan, Variational inference of disentangled latent concepts from unlabeled observations, in Proc. 6th Int. Conf. on Learning Representations, Vancouver, Canada, 2018.
[40]
H. Kim and A. Mnih, Disentangling by factorising, in Proc. 35th Int. Conf. on Machine Learning, Stockholm, Sweden, 2018, pp. 2654–2663.
[41]
D. P. Kingma and M. Welling, Auto-encoding variational Bayes, in Proc. 2nd Int. Conf. on Learning Representations, Banff, Canada, 2014.
[42]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial networks, Commun. ACM, vol. 63, no. 11, pp. 139–144, 2020.

[43]
J. Ma, C. Zhou, P. Cui, H. Yang, and W. Zhu, Learning disentangled representations for recommendation, in Proc. 33rd Int. Conf. on Neural Information Processing Systems, Vancouver, Canada, 2019, p. 513.
[44]
X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel, InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets, in Proc. 30th Int. Conf. on Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 2180–2188.
[45]
L. Tran, X. Yin, and X. Liu, Disentangled representation learning GAN for pose-invariant face recognition, in Proc. 2017 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 1283–1292.
DOI
[46]
C. P. Burgess, I. Higgins, A. Pal, L. Matthey, N. Watters, G. Desjardins, and A. Lerchner, Understanding disentangling in β-VAE, arXiv preprint arXiv: 1804.03599, 2018.
[47]
R. T. Q. Chen, X. Li, R. Grosse, and D. Duvenaud, Isolating sources of disentanglement in VAEs, in Proc. 32nd Int. Conf. on Neural Information Processing Systems, Montréal, Canada, 2018, pp. 2615–2625.
[48]
F. Locatello, S. Bauer, M. Lucic, G. Rätsch, S. Gelly, B. Schölkopf, and O. Bachem, Challenging common assumptions in the unsupervised learning of disentangled representations, in Proc. 36th Int. Conf. on Machine Learning, Long Beach, CA, USA, 2019, pp. 4114–4124.
[49]
A. Gabbay, N. Cohen, and Y. Hoshen, An image is worth more than a thousand words: Towards disentanglement in the wild, in Proc. 35th Advances in Neural Information Processing Systems, 2021, pp. 9216–9228.
[50]
P. Li, Y. Liu, H. Shi, X. Wu, Y. Hu, R. He, and Z. Sun, Dual-Structure disentangling variational generation for data-limited face parsing, in Proc. 28th ACM Int. Conf. on Multimedia, Seattle, WA, USA, 2020, pp. 556–564.
DOI
[51]
Y. Zhu, M. R. Min, A. Kadav, and H. P. Graf, S3VAE: Self-supervised sequential VAE for representation disentanglement and data generation, in Proc. 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 6537–6546.
DOI
[52]
J. Bromley, I. Guyon, Y. LeCun, E. Säckinger, and R. Shah, Signature verification using a “siamese” time delay neural network, in Proc. 6th Int. Conf. on Neural Information Processing Systems, Denver, CO, USA, 1993, pp. 737–744.
[53]
T. Bepler, E. Zhong, K. Kelley, E. Brignole, and B. Berger, Explicitly disentangling image content from translation and rotation with spatial-VAE, in Proc. 33rd Int. Conf. on Neural Information Processing Systems, Vancouver, Canada, 2019, p. 1383.
[54]
D. Bouchacourt, R. Tomioka, and S. Nowozin, Multi-level variational autoencoder: Learning disentangled representations from grouped observations, in Proc. 32nd AAAI Conf. on Artificial Intelligence and 30th Innovative Applications of Artificial Intelligence Conf. and 8th AAAI Symp. on Educational Advances in Artificial Intelligence, New Orleans, LA USA, 2018, p. 255.
DOI
[55]
R. Cai, Z. Li, P. Wei, J. Qiao, K. Zhang, and Z. Hao, Learning disentangled semantic representation for domain adaptation, in Proc. 28th Int. Joint Conf. on Artificial Intelligence, Macao, China, 2019, pp. 2060–2066.
DOI
[56]
A. C. Massagué, C. Zhang, Z. Feric, O. Camps, and R. Yu, Learning disentangled representations of video with missing data, in Proc. 34th Int. Conf. on Neural Information Processing Systems, Vancouver, Canada, 2020, p. 306.
[57]
N. S. Detlefsen and S. Hauberg, Explicit disentanglement of appearance and perspective in generative models, in Proc. 33rd Int. Conf. on Neural Information Processing Systems, Vancouver, Canada, 2019, p. 92.
[58]
L. Liao, J. Xiao, Z. Wang, C. W. Lin, and S. I. Satoh, Guidance and evaluation: Semantic-aware image inpainting for mixed scenes, in Proc. 16th European Conf. on Computer Vision, Glasgow, UK, 2020, pp. 683–700.
DOI
[59]

L. Liao, J. Xiao, Z. Wang, C. W. Lin, and S. I. Satoh, Uncertainty-aware semantic guidance and estimation for image inpainting, IEEE J. Sel. Top. Signal Process., vol. 15, no. 2, pp. 310–323, 2021.

[60]
B. Duan, C. Fu, Y. Li, X. Song, and R. He, Cross-spectral face hallucination via disentangling independent factors, in Proc. 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 7927–7935.
DOI
[61]
Z. H. Jiang, Q. Wu, K. Chen, and J. Zhang, Disentangled representation learning for 3D face shape, in Proc. 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 11949–11958.
DOI
[62]
L. Liao, J. Xiao, Z. Wang, C. W. Lin, and S. I. Satoh, Image inpainting guided by coherence priors of semantics and textures, in Proc. 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Nashville, TN, USA, 2021, pp. 6535–6544.
DOI
[63]
Y. Bai, Y. Lou, Y. Dai, J. Liu, Z. Chen, and L. Y. Duan, Disentangled feature learning network for vehicle re-identification, in Proc. 29th Int. Joint Conf. on Artificial Intelligence, Yokohama, Japan, 2021, pp. 6.
DOI
[64]
H. Y. Lee, H. Y. Tseng, J. B. Huang, M. Singh, and M. H. Yang, Diverse image-to-image translation via disentangled representations, in Proc. 15th European Conf. on Computer Vision, Munich, Germany, 2018, pp. 36–52.
DOI
[65]
F. Xiao, H. Liu, and Y. J. Lee, Identity from here, pose from there: SELF-supervised disentanglement and generation of objects using unlabeled videos, in Proc. 2019 IEEE/CVF Int. Conf. on Computer Vision, Seoul, Republic of Korea, 2019, pp. 7012–7021.
DOI
[66]
S. Li, B. Hooi, and G. H. Lee, Identifying through flows for recovering latent representations, in Proc. 8th Int. Conf. on Learning Representations, Addis Ababa, Ethiopia, 2020.
[67]
N. Hadad, L. Wolf, and M. Shahar, A two-step disentanglement method, in Proc. 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 772–780.
DOI
[68]
F. Liu, R. Zhu, D. Zeng, Q. Zhao, and X. Liu, Disentangling features in 3D face shapes for joint face reconstruction and recognition, in Proc. 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 5216–5225.
DOI
[69]
W. Nie, T. Karras, A. Garg, S. Debnath, A. Patney, A. B. Patel, and A. Anandkumar, Semi-supervised styleGAN for disentanglement learning, in Proc. 37th Int. Conf. on Machine Learning, 2020, p. 682.
[70]
G. Yin, B. Liu, L. Sheng, N. Yu, X. Wang, and J. Shao, Semantics disentangling for text-to-image generation, in Proc. 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 2322–2331.
DOI
[71]
S. Zhao, J. Song, and S. Ermon, Learning hierarchical features from deep generative models, in Proc. 34th Int. Conf. on Machine Learning, Sydney, Australia, 2017, pp. 4091–4099.
[72]
D. J. Rezende, S. Mohamed, and D. Wierstra, Stochastic backpropagation and approximate inference in deep generative models, in Proc. 31st Int. Conf. on Machine Learning, Beijing, China, 2014, pp. II-1278–II-1286.
[73]
M. Kim, Y. Wang, P. Sahu, and V. Pavlovic, Bayes-factor-VAE: Hierarchical Bayesian deep auto-encoder models for factor disentanglement, in Proc. 2019 IEEE/CVF Int. Conf. on Computer Vision, Seoul, Republic of Korea, 2019, pp. 2979–2987.
DOI
[74]
H. Sun, R. Mehta, H. Zhou, Z. Huang, S. Johnson, V. Prabhakaran, and V. Singh, DUAL-GLOW: Conditional flow-based generative model for modality transfer, in Proc. 2019 IEEE/CVF Int. Conf. on Computer Vision, Seoul, Republic of Korea, 2019, pp. 10610–10619.
DOI
[75]
X. Peng, X. Yu, K. Sohn, D. N. Metaxas, and M. Chandraker, Reconstruction-based disentanglement for pose-invariant face recognition, in Proc. 2017 IEEE Int. Conf. on Computer Vision, Venice, Italy, 2017, pp. 1632–1641.
DOI
[76]
B. Tong, C. Wang, M. Klinkigt, Y. Kobayashi, and Y. Nonaka, Hierarchical disentanglement of discriminative latent features for zero-shot learning, in Proc. 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 11459–11468.
DOI
[77]
D. A. Klindt, L. Schott, Y. Sharma, I. Ustyuzhaninov, W. Brendel, M. Bethge, and D. M. Paiton, Towards nonlinear disentanglement in natural data with temporal sparse coding, in Proc. 9th Int. Conf. on Learning Representations, virtual, 2021.
[78]
D. Kotovenko, A. Sanakoyeu, S. Lang, and B. Ommer, Content and style disentanglement for artistic style transfer, in Proc. 2019 IEEE/CVF Int. Conf. on Computer Vision, Seoul, Republic of Korea, 2019, pp. 4421–4430.
DOI
[79]
R. Kondo, K. Kawano, S. Koide, and T. Kutsuna, Flow-based image-to-image translation with feature disentanglement, in Proc. 33rd Int. Conf. on Neural Information Processing Systems, Vancouver, Canada, 2019, p. 375.
[80]
W. Lee, D. Kim, S. Hong, and H. Lee, High-fidelity synthesis with disentangled representation, in Proc. 16th European Conf. on Computer Vision, Glasgow, UK, 2020, pp. 157–174.
DOI
[81]
Y. Li, K. K. Singh, U. Ojha, and Y. J. Lee, MixNMatch: Multifactor disentanglement and encoding for conditional image generation, in Proc. 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 8036–8045.
DOI
[82]
G. Cohen, S. Afshar, J. Tapson, and A. van Schaik, EMNIST: Extending MNIST to handwritten letters, in Proc. 2017 Int. Joint Conf. on Neural Networks (IJCNN), Anchorage, AK, USA, 2017, pp. 2921–2926.
DOI
[83]
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, Caltech-Ucsd birds-200–2011 dataset, https://www.vision.caltech.edu/datasets/cub_200_2011/, 2011.
[84]
S. Xie, T. Yang, X. Wang, and Y. Lin, Hyper-class augmented and regularized deep learning for fine-grained image classification, in Proc. 2015 IEEE Conf. on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 2645–2654.
DOI
[85]

Y. Y. Khaw, C. Y. Chee, S. N. Gan, R. Singh, N. N. N. Ghazali, and N. S. Liu, Poly (lactic acid) composite films reinforced with microcrystalline cellulose and keratin from chicken feather fiber in 1-butyl-3-methylimidazolium chloride, J. Appl. Polym. Sci., vol. 136, no. 24, p. 47642, 2019.

[86]
R. Wu and S. Lu, LEED: Label-free expression editing via disentanglement, in Proc. 16th European Conf. on Computer Vision, Glasgow, UK, 2020, pp. 781–798.
DOI
[87]
Z. Liu, H. Zhang, Z. Chen, Z. Wang, and W. Ouyang, Disentangling and unifying graph convolutions for skeleton-based action recognition, in Proc. 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 140–149.
DOI
[88]
K. K. Singh, U. Ojha, and Y. J. Lee, FineGAN: Unsupervised hierarchical disentanglement for fine-grained object generation and discovery, in Proc. 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 6483–6492.
DOI
[89]

H. Chen, F. Chen, and H. He, SSC-GAN: A novel GAN based on the same solution constraints of first-order ODEs, Int. J. Pattern Recognit. Artif. Intell., vol. 35, no. 11, p. 2152018, 2021.

[90]
A. Gonzalez-Garcia, J. van de Weijer, and Y. Bengio, Image-to-image translation for cross-domain disentanglement, in Proc. 32nd Int. Conf. on Neural Information Processing Systems, Montréal, Canada, 2018, pp. 1294–1305.
[91]
F. Yang, H. Yang, J. Fu, H. Lu, and B. Guo, Learning texture transformer network for image super-resolution, in Proc. 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 5790–5799.
DOI
[92]
D. Minnen, J. Ballé, and G. Toderici, Joint autoregressive and hierarchical priors for learned image compression, in Proc. 32nd Int. Conf. on Neural Information Processing Systems, Montréal, Canada, 2018, pp. 10794–10803.
[93]
L. Cai, H. Gao, and S. Ji, Multi-stage variational auto-encoders for coarse-to-fine image generation, in Proc. 2019 SIAM Int. Conf. on Data Mining, Calgary, Canada, 2019, pp. 630–638.
DOI
[94]
J. Lezama, Overcoming the disentanglement vs reconstruction trade-off via Jacobian supervision, in Proc. 7th Int. Conf. on Learning Representations, New Orleans, LA, USA, 2019.
[95]
E. H. Sanchez, M. Serrurier, and M. Ortner, Learning disentangled representations via mutual information estimation, in Proc. 16th European Conf. on Computer Vision, Glasgow, UK, 2020, pp. 205–221.
DOI
[96]
J. Y. Zhu, Z. Zhang, C. Zhang, J. Wu, A. Torralba, J. B. Tenenbaum, and W. T. Freeman, Visual object networks: Image generation with disentangled 3D representation, in Proc. 32nd Int. Conf. on Neural Information Processing Systems, Montréal, Canada, 2018, pp. 118–129.
[97]
Z. Zhang, L. Tran, X. Yin, Y. Atoum, X. Liu, J. Wan, and N. Wang, Gait recognition via disentangled representation learning, in Proc. 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 4705–4714.
DOI
[98]
N. Srivastava, E. Mansimov, and R. Salakhutdinov, Unsupervised learning of video representations using LSTMs, in Proc. 32nd Int. Conf. on Machine Learning, Lille, France, 2015, pp. 843–852.
[99]
X. Peng, Z. Huang, X. Sun, and K. Saenko, Domain agnostic learning with disentangled representations, in Proc. 36th Int. Conf. on Machine Learning, Long Beach, CA, USA, 2019, pp. 5102–5112.
[100]
N. Pu, W. Chen, Y. Liu, E. M. Bakker, and M. S. Lew, Dual Gaussian-based variational subspace disentanglement for visible-infrared person re-identification, in Proc. 28th ACM Int. Conf. on Multimedia, Seattle, WA, USA, 2020, pp. 2149–2158.
DOI
[101]
J. Y. Zhu, T. Park, P. Isola, and A. A. Efros, Unpaired image-to-image translation using cycle-consistent adversarial networks, in Proc. 2017 IEEE Int. Conf. on Computer Vision, Venice, Italy, 2017, pp. 2242–2251.
DOI
[102]
X. Zhu, C. Xu, and D. Tao, Learning disentangled representations with latent variation predictability, in Proc. 16th European Conf. on Computer Vision, Glasgow, UK, 2020, pp. 684–700.
DOI
[103]
F. Locatello, G. Abbati, T. Rainforth, S. Bauer, B. Schölkopf, and O. Bachem, On the fairness of disentangled representations, in Proc. 33rd Int. Conf. on Neural Information Processing Systems, 2019, p. 1309.
[104]
Y. Zou, X. Yang, Z. Yu, B. V. K. Vijaya Kumar, and J. Kautz, Joint disentangling and adaptation for cross-domain person re-identification, in Proc. 16th European Conf. on Computer Vision, Glasgow, UK, 2020, pp. 87–104.
DOI
[105]

L. Liao, J. Xiao, Y. Li, M. Wang, and R. Hu, Learned representation of satellite image series for data compression, Remote Sens., vol. 12, no. 3, p. 497, 2020.

[106]

J. Xiao, R. Zhu, R. Hu, M. Wang, Y. Zhu, D. Chen, and D. Li, Towards real-time service from remote sensing: Compression of earth observatory video data via long-term background referencing, Remote Sens., vol. 10, no. 6, p. 876, 2018.

[107]
O. Press, T. Galanti, S. Benaim, and L. Wolf, Emerging disentanglement in auto-encoder based unsupervised image content transfer, in Proc. 7th Int. Conf. on Learning Representations, New Orleans, LA, USA, 2019.
[108]
M. Baktashmotlagh, M. Faraki, T. Drummond, and M. Salzmann, Learning factorized representations for open-set domain adaptation, in Proc. 7th Int. Conf. on Learning Representations, New Orleans, LA, USA, 2019.
[109]

G. Cheng and J. Han, A survey on object detection in optical remote sensing images, ISPRS J. Photogramm. Remote Sens., vol. 117, pp. 11–28, 2016.

[110]
S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, in Proc. 28th Int. Conf. on Neural Information Processing Systems, Montréal, Canada, 2015, pp. 91–99.
[111]
Y. Chen, W. Li, C. Sakaridis, D. Dai, and L. Van Gool, Domain adaptive faster R-CNN for object detection in the wild, in Proc. 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 3339–3348.
DOI
[112]

M. Wu, H. Yue, J. Wang, Y. Huang, M. Liu, Y. Jiang, C. Ke, and C. Zeng, Object detection based on RGC mask R-CNN, IET Image Process., vol. 14, no. 8, pp. 1502–1508, 2020.

[113]
X. Xie, G. Cheng, J. Wang, X. Yao, and J. Han, Oriented R-CNN for object detection, in Proc. 2021 IEEE/CVF Int. Conf. on Computer Vision, Montréal, Canada, 2021, pp. 3500–3509.
DOI
[114]
H. Zhu, X. Chen, W. Dai, K. Fu, Q. Ye, and J. Jiao, Orientation robust object detection in aerial images using deep convolutional neural network, in Proc. 2015 IEEE Int. Conf. on Image Processing (ICIP), Quebec City, Canada, 2015, pp. 3735–3739.
DOI
[115]
M. Chang, T. D. Ullman, A. Torralba, and J. B. Tenenbaum, A compositional object-based approach to learning physical dynamics, in Proc. 5th Int. Conf. on Learning Representations, Toulon, France, 2017.
[116]
W. Guo, H. Huang, X. Kong, and R. He, Learning disentangled representation for cross-modal retrieval with deep mutual information estimation, in Proc. 27th ACM Int. Conf. on Multimedia, Nice, France, 2019, pp. 1712–1720.
DOI
[117]
M. Yin, Z. Yao, Y. Cao, X. Li, Z. Zhang, S. Lin, and H. Hu, Disentangled non-local neural networks, in Proc. 16th European Conf. on Computer Vision, Glasgow, UK, 2020, pp. 191–207.
DOI
[118]

T. Wang and Y. Li, Rotation-invariant task-aware spatial disentanglement in rotated ship detection based on the three-stage method, IEEE Trans. Geosci. Remote Sens., vol. 60, p. 5609112, 2021.

[119]
A. H. Liu, Y. C. Liu, Y. Y. Yeh, and Y. C. F. Wang, A unified feature disentangler for multi-domain image translation and manipulation, in Proc. 32nd Int. Conf. on Neural Information Processing Systems, Montréal, Canada, 2018, pp. 2595–2604.
[120]
Z. Li, J. V. Murkute, P. K. Gyawali, and L. Wang, Progressive learning and disentanglement of hierarchical representations, in Proc. 8th Int. Conf. on Learning Representations, Addis Ababa, Ethiopia, 2020.
[121]
X. Zhu, A normalized-cut alignment model for mapping hierarchical semantic structures onto spoken documents, in Proc. 15th Conf. on Computational Natural Language Learning, Portland, OR, USA, 2011, pp. 210–218.
[122]

Y. Geng, C. Tao, J. Shen, and Z. Zou, High-resolution remote sensing image semantic segmentation based on semi-supervised full convolution network method, Acta Geodaetica et Cartographica Sinica, vol. 49, no. 4, pp. 499–508, 2020.

[123]

D. M. Vo and S. W. Lee, Semantic image segmentation using fully convolutional neural networks with multi-scale images and multi-scale dilated convolutions, Multimed. Tools Appl., vol. 77, no. 14, pp. 18689–18707, 2018.

[124]

C. Zhang, W. Jiang, and Q. Zhao, Semantic segmentation of aerial imagery via split-attention networks with disentangled nonlocal and edge supervision, Remote Sens., vol. 13, no. 6, p. 1176, 2021.

[125]

L. Mou, Y. Hua, and X. X. Zhu, Relation matters: Relational context-aware fully convolutional network for semantic segmentation of high-resolution aerial images, IEEE Trans. Geosci. Remote Sens., vol. 58, no. 11, pp. 7557–7569, 2020.

[126]

L. Ding, H. Tang, and L. Bruzzone, LANet: Local attention embedding to improve the semantic segmentation of remote sensing images, IEEE Trans. Geosci. Remote Sens., vol. 59, no. 1, pp. 426–435, 2021.

[127]
C. Bass, M. da Silva, C. Sudre, P. D. Tudosiu, S. M. Smith, and E. C. Robinson, ICAM: Interpretable classification via disentangled representations and feature attribution mapping, in Proc. 34th Int. Conf. on Neural Information Processing Systems, 2020, p. 645.
[128]
J. Yang, N. C. Dvornek, F. Zhang, J. Chapiro, M. Lin, and J. S. Duncan, Unsupervised domain adaptation via disentangled representations: Application to cross-modality liver segmentation, in Proc. 22nd Int. Conf. on Medical Image Computing and Computer Assisted Intervention, Shenzhen, China, 2019, pp. 255–263.
DOI
[129]

J. Nie, C. Zheng, C. Wang, Z. Zuo, X. Lv, S. Yu, and Z. Wei, Scale–relation joint decoupling network for remote sensing image semantic segmentation, IEEE Trans. Geosci. Remote Sens., vol. 60, p. 5412812, 2022.

[130]

B. Fang, G. Chen, G. Ouyang, J. Chen, R. Kou, and L. Wang, Content-invariant dual learning for change detection in remote sensing images, IEEE Trans. Geosci. Remote Sens., vol. 60, p. 5603317, 2021.

[131]
Y. LeCun, F. J. Huang, and L. Bottou, Learning methods for generic object recognition with invariance to pose and lighting, in Proc. 2004 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, Washington, DC, USA, 2004, pp. 97–104.
[132]
N. Aifanti, C. Papachristou, and A. Delopoulos, The MUG facial expression database, in Proc. 11th Int. Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10, Desenzano del Garda, Italy, 2010, pp. 1–4.
[133]
H. Xiao, K. Rasul, and R. Vollgraf, Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms, arXiv preprint arXiv: 1708.07747, 2017.
[134]
M. Aubry, D. Maturana, A. A. Efros, B. C. Russell, and J. Sivic, Seeing 3D chairs: Exemplar part-based 2D-3D alignment using a large dataset of CAD models, in Proc. 2014 IEEE Conf. on Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp. 3762–3769.
DOI
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 30 September 2022
Revised: 12 December 2022
Accepted: 17 January 2023
Published: 10 March 2023
Issue date: December 2022

Copyright

© The author(s) 2022

Acknowledgements

Acknowledgment

This work was supported by the National Natural Science Foundation of China (Nos. 61825103 and 62202349), the Natural Science Foundation of Hubei Province (Nos. 2022CFB352 and 2020CFA001), and the Key Research & Development of Hubei Province (No. 2020BIB006).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return