References(155)
[1]
Zhang, H.; Xu, T.; Li, H. S.; Zhang, S. T.; Wang, X. G.; Huang, X. L.; Metaxas, D. StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, 5908–5916, 2017.
[2]
Qiao, T. T.; Zhang, J.; Xu, D. Q.; Tao, D. C. MirrorGAN: Learning text-to-image generation by redescription. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1505–1514, 2019.
[3]
Zhu, M. F.; Pan, P. B.; Chen, W.; Yang, Y. DM-GAN: Dynamic memory generative adversarial networks for text-to-image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5795–5803, 2019.
[4]
Zhang, H.; Koh, J. Y.; Baldridge, J.; Lee, H.; Yang, Y. F. Cross-modal contrastive learning for text-to-image generation. arXiv preprint arXiv:2101.04702, 2021.
[5]
Karras, T.; Laine, S.; Aila, T. M. A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4396–4405, 2019.
[6]
Sangkloy, P.; Lu, J. W.; Fang, C.; Yu, F.; Hays, J. Scribbler: Controlling deep image synthesis with sketch and color. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6836–6845, 2017.
[7]
Ghosh, A.; Zhang, R.; Dokania, P.; Wang, O.; Efros, A.; Torr, P.; Shechtman, E. Interactive sketch & fill: Multiclass sketch-to-image translation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 1171–1180, 2019.
[8]
Gao, C. Y.; Liu, Q.; Xu, Q.; Wang, L. M.; Liu, J. Z.; Zou, C. Q. SketchyCOCO: Image generation from freehand scene sketches. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5173–5182, 2020.
[9]
Liu, R.; Yu, Q.; Yu, S. Unsupervised sketch-to-photo synthesis. arXiv preprint arXiv:1909.08313, 2019.
[10]
Li, J. N.; Yang, J. M.; Hertzmann, A.; Zhang, J. M.; Xu, T. F. LayoutGAN: Generating graphic layouts with wireframe discriminators arXiv preprint arXiv:1901.06767, 2019.
[11]
Xue, Y.; Zhou, Z. H.; Huang, X. L. Neural wireframe renderer: Learning wireframe to image translations. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12371. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 279–295, 2020.
[12]
Wang, M.; Lyu, X. Q.; Li, Y. J.; Zhang, F. L. VR content creation and exploration with deep learning: A survey. Computational Visual Media Vol. 6, No. 1, 3–28, 2020.
[13]
Kipf, T. N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
[14]
Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved techniques for training GANs. In: Proceedings of the 29th International Conference on Neural Information Processing Systems, 2234–2242, 2016.
[15]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2818–2826, 2016.
[16]
Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 6629–6640, 2017.
[17]
Wang, Z.; Bovik, A. C.; Sheikh, H. R.; Simoncelli, E. P. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing Vol. 13, No. 4, 600–612, 2004.
[18]
Wang, Z.; Simoncelli, E. P.; Bovik, A. C. Multiscale structural similarity for image quality assessment. In: Proceedings of the 37th Asilomar Conference on Signals, Systems & Computers, 1398–1402, 2003.
[19]
Zhang, R.; Isola, P.; Efros, A. A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 586–595, 2018.
[20]
Rezende, D. J.; Mohamed, S. Variational inference with normalizing flows. In: Proceedings of the International Conference on Machine Learning, 1530–1538, 2015.
[21]
Kingma, D. P.; Dhariwal, P. Glow: Generative flow with invertible 1×1 convolutions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 10215–10224, 2018.
[22]
Oord, A. V. D.; Kalchbrenner, N.; Kavukcuoglu, K. Pixel recurrent neural networks. In: Proceedings of the International Conference on Machine Learning, 1747–1756, 2016.
[23]
Oord, A. V. D.; Kalchbrenner, N.; Espeholt, L.; Kavukcuoglu, K.; Vinyals, O.; Graves, A. Conditional image generation with pixelCNN decoders. In: Proceedings of the 29th International Conference on Neural Information Processing Systems, 4790–4798, 2016.
[24]
Salimans, T.; Karpathy, A.; Chen, X.; Kingma, D. P. PixelCNN++: Improving the pixelCNN with discretized logistic mixture likelihood and other modifications. arXiv preprint arXiv:1701.05517, 2017.
[25]
Xu, T.; Zhang, P. C.; Huang, Q. Y.; Zhang, H.; Gan, Z.; Huang, X. L.; He, X. D. AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1316–1324, 2018.
[26]
Lu, Y. Y.; Wu, S. Z.; Tai, Y. W.; Tang, C. K. Image generation from sketch constraint using contextual GAN. In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science, Vol. 11220. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 213–228, 2018.
[27]
Ma, L.; Jia, X.; Sun, Q.; Schiele, B.; Tuytelaars, T.; Van Gool, L. Pose guided person image generation. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, 406–416, 2017.
[28]
Ma, L. Q.; Sun, Q. R.; Georgoulis, S.; Van Gool, L.; Schiele, B.; Fritz, M. Disentangled person image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 99–108, 2018.
[29]
Siarohin, A.; Sangineto, E.; Lathuilière, S.; Sebe, N. Deformable GANs for pose-based human image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3408–3416, 2018.
[30]
Song, S. J.; Zhang, W.; Liu, J. Y.; Mei, T. Unsupervised person image generation with semantic parsing transformation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2352–2361, 2019.
[31]
Zhu, Z.; Huang, T. T.; Shi, B. G.; Yu, M.; Wang, B. F.; Bai, X. Progressive pose attention transfer for person image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2342–2351, 2019.
[32]
Belongie, S.; Malik, J.; Puzicha, J. Shape context: A new descriptor for shape matching and object recognition. In: Proceedings of the International Conference on Neural Information Processing Systems, 831–837, 2000.
[33]
Chen, T.; Cheng, M. M.; Tan, P.; Shamir, A.; Hu, S. M. Sketch2Photo: Internet image montage. ACM Transactions on Graphics Vol. 28, No. 5, Article No. 124, 2009.
[34]
Goodfellow, I. J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio Y. Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, 2672–2680, 2014.
[35]
Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
[36]
Miyato, T.; Koyama, M. cGANs with projection discriminator. In: Proceedings of the International Conference on Learning Representations, 2018.
[37]
Odena, A.; Olah, C.; Shlens, J. Conditional image synthesis with auxiliary classifier GANs. In: Proceedings of the International Conference on Machine Learning, 2642–2651, 2017.
[38]
Kingma, D. P.; Welling, M. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
[39]
Sohn, K.; Lee, H.; Yan, X. Learning structured output representation using deep conditional generative models. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, 3483–3491, 2015.
[40]
Klys, J.; Snell, J.; Zemel, R. Learning latent subspaces in variational autoencoders. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 6444–6454, 2018.
[41]
Ivanov, O.; Figurnov, M.; Vetrov, D. Variational autoencoder with arbitrary conditioning. In: Proceedings of the International Conference on Learning Representations, 2018.
[42]
Larsen, A. B. L.; Sønderby, S. K.; Larochelle, H.; Winther, O. Auto encoding beyond pixels using a learned similarity metric. In: Proceedings of the International Conference on Machine Learning, 1558–1566, 2016.
[43]
Bao, J. M.; Chen, D.; Wen, F.; Li, H. Q.; Hua, G. C. VAE-GAN: Fine-grained image generation through asymmetric training. In: Proceedings of the IEEE International Conference on Computer Vision, 2764–2773, 2017.
[44]
Nilsback, M. E.; Zisserman, A. Automated flower classification over a large number of classes. In: Proceedings of the 6th Indian Conference on Computer Vision, Graphics & Image Processing, 722–729, 2008.
[45]
Welinder, P.; Branson, S.; Mita, T.; Wah, C.; Schrofi, F.; Belongie, S.; Perona, P. Caltech-UCSD Birds200. Technical Report CNS-TR-2010-001. California Institute of Technology, 2010.
[46]
Lin, T. Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollfiar, P.; Zitnick, C. L. Microsoft COCO: Common objects in context. In: Computer Vision–ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 740–755, 2014.
[47]
Reed, S.; Akata, Z.; Yan, X.; Logeswaran, L.; Schiele, B.; Lee, H. Generative adversarial text to image synthesis. In: Proceedings of the International Conference on Machine Learning, 1060–1069, 2016.
[48]
Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
[49]
Mansimov, E.; Parisotto, E.; LeiBa, J.; Salakhutdinov, R. Generating images from captions with attention. arXiv preprint arXiv:1511.02793, 2015.
[50]
Zhang, H.; Xu, T.; Li, H. S.; Zhang, S. T.; Wang, X. G.; Huang, X. L.; Metaxas, D. N. StackGAN++: Realistic image synthesis with stacked generative adversarial networks. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 41, No. 8, 1947–1962, 2019.
[51]
Zhang, Z. Z.; Xie, Y. P.; Yang, L. Photographic text-to-image synthesis with a hierarchically-nested adversarial network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6199–6208, 2018.
[52]
Isola, P.; Zhu, J. Y.; Zhou, T. H.; Efros, A. A. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5967–5976, 2017.
[53]
Yin, G. J.; Liu, B.; Sheng, L.; Yu, N. H.; Wang, X. G.; Shao, J. Semantics disentangling for text-to-image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2322–2331, 2019.
[54]
Reed, S. E.; Akata, Z.; Mohan, S.; Tenka, S.; Schiele, B.; Lee, H. Learning what and where to draw. In: Proceedings of the 29th International Conference on Neural Information Processing Systems, 217–225, 2016.
[55]
Hong, S.; Yang, D. D.; Choi, J.; Lee, H. Inferring semantic layout for hierarchical text-to-image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7986–7994, 2018.
[56]
Chen, Q. F.; Koltun, V. Photographic image synthesis with cascaded refinement networks. In: Proceedings of the IEEE International Conference on Computer Vision, 1520–1529, 2017.
[57]
Li, W. B.; Zhang, P. C.; Zhang, L.; Huang, Q. Y.; He, X. D.; Lyu, S. W.; Gao, J. F. Object-driven text-to-image synthesis via adversarial training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12166–12174, 2019.
[58]
Girshick, R. Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 1440–1448, 2015.
[59]
Johnson, J.; Gupta, A.; Li, F. F. Image generation from scene graphs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1219–1228, 2018.
[60]
Krishna, R.; Zhu, Y. K.; Groth, O.; Johnson, J.; Hata, K. J.; Kravitz, J.; Chen, S.; KAlantidis, Y.; Li, L.-J.; Shamma, D. A. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision Vol. 123, No. 1, 32–73, 2017.
[61]
Caesar, H.; Uijlings, J.; Ferrari, V. COCO-stuff: Thing and stuff classes in context. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1209–1218, 2018.
[62]
Hinz, T.; Heinrich, S.; Wermter, S. Generating multiple objects at spatially distinct locations. arXiv preprint arXiv:1901.00686, 2019.
[63]
Tan, F. W.; Feng, S.; Ordonez, V. Text2Scene: Generating compositional scenes from textual descriptions. arXiv preprint arXiv:1809.01110, 2018.
[64]
Bodla, N.; Hua, G.; Chellappa, R. Semi-supervised FusedGAN for conditional image generation. In: Computer Vision–ECCV 2018. Lecture Notes in Computer Science, Vol. 11209. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 689–704, 2018.
[65]
Hinz, T.; Heinrich, S.; Wermter, S. Semantic object accuracy for generative text-to-image synthesis. arXiv preprint arXiv:1910.13321, 2019.
[66]
Wang, T. C.; Liu, M. Y.; Zhu, J. Y.; Tao, A.; Kautz, J.; Catanzaro, B. High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8798–8807, 2018.
[67]
Eitz, M.; Richter, R.; Hildebrand, K.; Boubekeur, T.; Alexa, M. Photosketcher: Interactive sketch-based image synthesis. IEEE Computer Graphics and Applications Vol. 31, No. 6, 56–66, 2011.
[68]
Hu, S.-M.; Zhang, F.-L.; Wang, M.; Martin, R. R.; Wang, J. PatchNet: A patch-based image representation for interactive library-driven image editing. ACM Transactions on Graphics Vol. 32, No. 6, Article No. 196, 2013.
[69]
Wang, J. Y.; Zhao, Y.; Qi, Q.; Huo, Q. M.; Zou, J.; Ge, C.; Liao, J. MindCamera: Interactive sketch-based image retrieval and synthesis. IEEE Access Vol. 6, 3765–3773, 2018.
[70]
Turmukhambetov, D.; Campbell, N. D. F.; Goldman, D. B.; Kautz, J. Interactive sketch-driven image synthesis. Computer Graphics Forum Vol. 34, No. 8, 130–142, 2015.
[71]
Xie, S. N.; Tu, Z. W. Holistically-nested edge detection. In: Proceedings of the IEEE International Conference on Computer Vision, 1395–1403, 2015.
[72]
Winnemöller, H.; Kyprianidis, J. E.; Olsen, S. C. XDoG: An eXtended difference-of-Gaussians compendium including advanced image stylization. Computers & Graphics Vol. 36, No. 6, 740–753, 2012.
[73]
Kang, H.; Lee, S.; Chui, C. K. Coherent line drawing. In: Proceedings of the 5th International Symposium on Non-photorealistic Animation and Rendering, 43–50, 2007.
[74]
Li, Y. J.; Fang, C.; Hertzmann, A.; Shechtman, E.; Yang, M. H. Im2Pencil: Controllable pencil illustration from photographs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1525–1534, 2019.
[75]
Li, M. T.; Lin, Z.; Mech, R.; Yumer, E.; Ramanan, D. Photo-sketching: Inferring contour drawings from images. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 1403–1412, 2019.
[76]
Gastal, E. S. L.; Oliveira, M. M. Domain transform for edge-aware image and video processing. ACM Transactions on Graphics Vol. 30, No. 4, Article No. 69, 2011.
[77]
Hahn-Powell, G. V.; Archangeli, D. AutoTrace: An automatic system for tracing tongue contours. The Journal of the Acoustical Society of America Vol. 136, No. 4, 2104, 2014.
[78]
Simo-Serra, E.; Iizuka, S.; Sasaki, K.; Ishikawa, H. Learning to simplify. ACM Transactions on Graphics Vol. 35, No. 4, Article No. 121, 2016.
[79]
Chen, W. L.; Hays, J. SketchyGAN: Towards diverse and realistic sketch to image synthesis. In: Proceedings of the IEEE/CVF Conference on ComputerVision and Pattern Recognition, 9416–9425, 2018.
[80]
Li, Y. H.; Chen, X. J.; Wu, F.; Zha, Z. J. LinesToFacePhoto: Face photo generation from lines with conditional self-attention generative adversarial networks. In: Proceedings of the 27th ACM International Conference on Multimedia, 2323–2331, 2019.
[81]
Güçlütürk, Y.; Güçlü, U.; van Lier, R.; van Gerven, M. A. J. Convolutional sketch inversion. In: Computer Vision–ECCV 2016 Workshops. Lecture Notes in Computer Science, Vol. 9913. Hua, G.; Jégou, H. Eds. Springer Cham, 810–824, 2016.
[82]
Mescheder, L.; Geiger, A.; Nowozin, S. Which training methods for GANs do actually converge? arXiv preprint arXiv:1801.04406, 2018.
[83]
Huang, X.; Liu, M. Y.; Belongie, S.; Kautz, J. Multimodal unsupervised image-to-image translation. In: Computer Vision–ECCV 2018. Lecture Notes in Computer Science, Vol. 11207. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 179–196, 2018.
[84]
Portenier, T.; Hu, Q.; Szabó, A.; Bigdeli, S. A.; Favaro, P.; Zwicker, M. Faceshop: Deep sketch-based face image editing. arXiv preprint arXiv:1804.08972, 2018.
[85]
Xia, W.; Yang, Y.; Xue, J.-H. Calisketch: Stroke calibration and completion for high quality face image generation from poorly-drawn sketches. arXiv preprint arXiv:1911.00426, 2019.
[86]
Chen, S.-Y.; Su, W.; Gao, L.; Xia, S.; Fu, H. DeepFaceDrawing: Deep generation of face images from sketches. ACM Transactions on Graphics Vol. 39, No. 4, Article No. 72, 2020.
[87]
Sangkloy, P.; Burnell, N.; Ham, C.; Hays, J. The sketchy database. ACM Transactions on Graphics Vol. 35, No. 4, Article No. 119, 2016.
[88]
Eitz, M.; Hays, J.; Alexa, M. How do humans sketch objects? ACM Transactions on Graphics Vol. 31, No. 4, Article No. 44, 2012.
[89]
Caesar, H.; Uijlings, J.; Ferrari, V. COCO-stuff: Thing and stuff classes in context. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1209–1218, 2018.
[90]
Huang, X.; Belongie, S. Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision, 1510–1519, 2017.
[91]
Zhu, P. H.; Abdal, R.; Qin, Y. P.; Wonka, P. SEAN: Image synthesis with semantic region-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5103–5112, 2020.
[92]
Yu, Q.; Liu, F.; Song, Y. Z.; Xiang, T.; Hospedales, T. M.; Loy, C. C. Sketch me that shoe. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 799–807, 2016.
[93]
Krause, J.; Stark, M.; Jia, D.; Li, F. F. 3D object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, 554–561, 2013.
[94]
Yu, A.; Grauman, K. Fine-grained visual comparisons with local learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 192–199, 2014.
[95]
Yu, A.; Grauman, K. Semantic jitter: Dense supervision for visual comparisons via synthetic images. In: Proceedings of the IEEE International Conference on Computer Vision, 5571–5580, 2017.
[96]
Liu, Z. W.; Luo, P.; Wang, X. G.; Tang, X. O. Deep learning face attributes in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, 3730–3738, 2015.
[97]
Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196, 2017.
[98]
Wang, X. G.; Tang, X. O. Face photo-sketch synthesis and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 31, No. 11, 1955–1967, 2009.
[99]
Johnson, M.; Brostow, G. J.; Shotton, J.; Arandjelovic, O.; Kwatra, V.; Cipolla, R. Semantic photo synthesis. Computer Graphics Forum Vol. 25, No. 3, 407–413, 2006.
[100]
Bansal, A.; Sheikh, Y.; Ramanan, D. Shapes and context: In-the-wild image synthesis & manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2312–2321, 2019.
[101]
Chen, Q. F.; Koltun, V. Photographic image synthesis with cascaded refinement networks. In: Proceedings of the IEEE International Conference on Computer Vision, 1520–1529, 2017.
[102]
Lassner, C.; Pons-Moll, G.; Gehler, P. V. A generative model of people in clothing. In: Proceedings of the IEEE International Conference on Computer Vision, 853–862, 2017.
[103]
Park, T.; Liu, M. Y.; Wang, T. C.; Zhu, J. Y. Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2332–2341, 2019.
[104]
Liu, X.; Yin, G.; Shao, J.; Wang, X.; Li, H. Learning to predict layout-to-image conditional convolutions for semantic image synthesis. In: Proceedings of the 33rd Conference on Neural Information Processing Systems, 570–580, 2019.
[105]
Zhu, Z.; Xu, Z. L.; You, A. S.; Bai, X. Semantically multi-modal image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5466–5475, 2020.
[106]
Tang, H.; Xu, D.; Yan, Y.; Torr, P. H. S.; Sebe, N. Local class-specific and global image-level generative adversarial networks for semantic-guided scene generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7867–7876, 2020.
[107]
Qi, X. J.; Chen, Q. F.; Jia, J. Y.; Koltun, V. Semi-parametric image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8808–8816, 2018.
[108]
Wang, M.; Yang, G. Y.; Li, R. L.; Liang, R. Z.; Zhang, S. H.; Hall, P. M.; Hu, S.-M. Example-guided style-consistent image synthesis from semantic labeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1495–1504, 2019.
[109]
Liang, X. D.; Liu, S.; Shen, X. H.; Yang, J. C.; Liu, L. Q.; Dong, J.; Lin, L.; Yan, S. C. Deep human parsing with active template regression. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 12, 2402–2414, 2015.
[110]
Liang, X. D.; Xu, C. Y.; Shen, X. H.; Yang, J. C.; Liu, S.; Tang, J. H.; Lin, L.; Yan, S. C. Human parsing with contextualized convolutional neural network. In: Proceedings of the IEEE International Conference on Computer Vision, 1386–1394, 2015.
[111]
Liu, Z. W.; Luo, P.; Qiu, S.; Wang, X. G.; Tang, X. O. DeepFashion: Powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1096–1104, 2016.
[112]
Lee, C. H.; Liu, Z. W.; Wu, L. Y.; Luo, P. MaskGAN: Towards diverse and interactive facial image manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5548–5557, 2020.
[113]
Zhou, B. L.; Zhao, H.; Puig, X.; Fidler, S.; Barriuso, A.; Torralba, A. Semantic understanding of scenes through the ADE20K dataset. arXiv preprint arXiv:1608.05442, 2016.
[114]
Zhou, B. L.; Zhao, H.; Puig, X.; Fidler, S.; Barriuso, A.; Torralba, A. Scene parsing through ADE20K dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5122–5130, 2017.
[115]
Silberman, N.; Hoiem, D.; Kohli, P.; Fergus, R. Indoor segmentation and support inference from RGBD images. In: Computer Vision – ECCV 2012. Lecture Notes in Computer Science, Vol. 7576. Fitzgibbon, A.; Lazebnik, S.; Perona, P.; Sato, Y.; Schmid, C. Eds. Springer Berlin Heidelberg, 746–760, 2012.
[116]
Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele B. The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3213–3223, 2016.
[117]
Bem, R. D.; Ghosh, A.; Boukhayma, A.; Ajanthan, T.; Siddharth, N.; Torr, P. A conditional deep generative model of people in natural images. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 1449–1458, 2019.
[118]
Chen, L. C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A. L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 40, No. 4, 834–848, 2018.
[119]
Chen, L. C.; Zhu, Y. K.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science, Vol. 11211. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 833–851, 2018.
[120]
Balakrishnan, G.; Zhao, A.; Dalca, A. V.; Durand, F.; Guttag, J. Synthesizing images of humans in unseen poses. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8340–8348, 2018.
[121]
Pumarola, A.; Agudo, A.; Sanfeliu, A.; Moreno-Noguer, F. Unsupervised person image synthesis in arbitrary poses. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8620–8628, 2018.
[122]
Dong, H.; Liang, X.; Gong, K.; Lai, H.; Zhu, J.; Yin, J. Soft-gated warping-GAN for pose-guided person image synthesis. In: Proceedings of the 32nd Conference on Neural Information Processing Systems, 474–484, 2018.
[123]
Li, Y. N.; Huang, C.; Loy, C. C. Dense intrinsic appearance flow for human pose transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3688–3697, 2019.
[124]
Zheng, L.; Shen, L. Y.; Tian, L.; Wang, S. J.; Wang, J. D.; Tian, Q. Scalable person re-identification: A benchmark. In: Proceedings of the IEEE International Conference on Computer Vision, 1116–1124, 2015.
[125]
Yan, X. C.; Yang, J. M.; Sohn, K.; Lee, H. Attribute2Image: Conditional image generation from visual attributes. In: Computer Vision – ECCV 2016. Lecture Notes in Computer Science, Vol. 9908. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 776–791, 2016.
[126]
Huang, G. B.; Ramesh, M.; Berg, T.; Learned-Miller, E. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49. University of Massachusetts, 2007.
[127]
He, Z. L.; Zuo, W. M.; Kan, M. N.; Shan, S. G.; Chen, X. L. AttGAN: Facial attribute editing by only changing what you want. IEEE Transactions on Image Processing Vol. 28, No. 11, 5464–5478, 2019.
[128]
Zhang, G.; Kan, M. N.; Shan, S. G.; Chen, X. L. Generative adversarial network with spatial attention for face attribute editing. In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science, Vol. 11210. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 422–437, 2018.
[129]
Qian, S. J.; Lin, K. Y.; Wu, W.; Liu, Y.; Wang, Q.; Shen, F. M.; Qian, C.; He, R. Make a face: Towards arbitrary high fidelity face manipulation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 10032–10041, 2019.
[130]
Men, Y. F.; Mao, Y. M.; Jiang, Y. N.; Ma, W. Y.; Lian, Z. H. Controllable person image synthesis with attribute-decomposed GAN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5083–5092, 2020.
[131]
Lee, H.; Lee, S. G. Fashion attributes-to-image synthesis using attention-based generative adversarial network. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 462–470, 2019.
[132]
Dinh, L.; Sohl-Dickstein, J.; Bengio, S. Density estimation using real NVP. arXiv preprint arXiv: 1605.08803, 2016.
[133]
Zhao, B.; Meng, L. L.; Yin, W. D.; Sigal, L. Image generation from layout. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8576–8585, 2019.
[134]
Luo, A.; Zhang, Z. T.; Wu, J. J.; Tenenbaum, J. B. End-to-end optimization of scene layout. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3753–3762, 2020.
[135]
Song, S. R.; Yu, F.; Zeng, A.; Chang, A. X.; Savva, M.; Funkhouser, T. Semantic scene completion from a single depth image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 190–198, 2017.
[136]
Choi, Y.; Choi, M.; Kim, M.; Ha, J. W.; Kim, S.; Choo, J. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8789–8797, 2018.
[137]
Vahdat, A.; Kautz, J. NVAE: A deep hierarchical variational autoencoder. In: Proceedings of the 34th Conference on Neural Information Processing Systems, 2020.
[138]
Zhang, H.; Goodfellow, I. J.; Metaxas, D.; Odena, A. Self-attention generative adversarial networks. In: Proceedings of the International Conference on Machine Learning, 7354–7363, 2019.
[139]
De Vries, H.; Strub, F.; Mary, J.; Larochelle, H.; Pietquin, O.; Courville A. Modulating early visual processing by language. In: Proceedings of the 30th Conference on Neural Information Processing Systems 6594–6604, 2017.
[140]
Miyato, T.; Kataoka, T.; Koyama, M.; Yoshida, Y. Spectral normalization for generative adversarial networks. In: Proceedings of the International Conference on Learning Representations, 2018.
[141]
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, Vol. 70, 214–223, 2017.
[142]
Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A. C. Improved training of Wasserstein GANs. In: Proceedings of the 30th Conference on Neural Information Processing Systems, 5767–5777, 2017.
[143]
Mao, X. D.; Li, Q.; Xie, H. R.; Lau, R. Y. K.; Wang, Z.; Smolley, S. P. Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2813–2821, 2017.
[144]
Lim, J. H.; Ye, J. C. Geometric GAN. arXiv preprint arXiv:1705.02894, 2017.
[145]
Johnson, J.; Alahi, A.; Li, F. F. Perceptual losses for real-time style transfer and super-resolution. In: Computer Vision – ECCV 2016. Lecture Notes in Computer Science, Vol. 9906. Leibem, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 694–711, 2016.
[146]
Velifickovific, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lifio, P.; Bengio, Y. Graph attention networks. In: Proceedings of the International Conference on Learning Representations, 2018.
[147]
Ionescu, C.; Papava, D.; Olaru, V.; Sminchisescu, C. Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 36, No. 7, 1325–1339, 2014.
[148]
Li, Y. T.; Gan, Z.; Shen, Y. L.; Liu, J. J.; Cheng, Y.; Wu, Y. X.; Carin, L.; Carlson, D.; Gao, J. F. StoryGAN: A sequential conditional GAN for story visualization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6322–6331, 2019.
[149]
Pan, Y. W.; Qiu, Z. F.; Yao, T.; Li, H. Q.; Mei, T. To create what you tell: Generating videos from captions. In: Proceedings of the 25th ACM international Conference on Multimedia, 1789–1798, 2017.
[150]
Li, Y.; Min, M. R.; Shen, D.; Carlson, D.; Carin, L. Video generation from text. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
[151]
Wang, M.; Yang, G.-W.; Hu, S.-M.; Yau, S.-T.; Shamir, A. Write-a-video: Computational video montage from themed text. ACM Transactions on Graphics Vol. 38, No. 6, Article No. 177, 2019.
[152]
Chen, L. L.; Maddox, R. K.; Duan, Z. Y.; Xu, C. L. Hierarchical cross-modal talking face generation with dynamic pixel-wise loss. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7824–7833, 2019.
[153]
Zhou, H.; Liu, Y.; Liu, Z. W.; Luo, P.; Wang, X. G. Talking face generation by adversarially disentangled audio-visual representation. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 33, 9299–9306, 2019.
[154]
Wen, X.; Wang, M.; Richardt, C.; Chen, Z. Y.; Hu, S. M. Photorealistic audio-driven video portraits. IEEE Transactions on Visualization and Computer Graphics Vol. 26, No. 12, 3457–3466, 2020.
[155]
Mescheder, L.; Nowozin, S.; Geiger, A. Adversarial variational bayes: Unifying variational autoencoders and generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, 2391–2400, 2017.