Journal Home > Volume 22 , Issue 6

This paper presents a survey of image synthesis and editing with Generative Adversarial Networks (GANs). GANs consist of two deep networks, a generator and a discriminator, which are trained in a competitive way. Due to the power of deep networks and the competitive training manner, GANs are capable of producing reasonable and realistic images, and have shown great capability in many image synthesis and editing applications. This paper surveys recent GAN papers regarding topics including, but not limited to, texture synthesis, image inpainting, image-to-image translation, and image editing.


menu
Abstract
Full text
Outline
About this article

A Survey of Image Synthesis and Editing with Generative Adversarial Networks

Show Author's information Xian WuKun Xu( )Peter Hall
TNList and the Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China.
Department of Computer Science, University of Bath, Bath, UK.

Abstract

This paper presents a survey of image synthesis and editing with Generative Adversarial Networks (GANs). GANs consist of two deep networks, a generator and a discriminator, which are trained in a competitive way. Due to the power of deep networks and the competitive training manner, GANs are capable of producing reasonable and realistic images, and have shown great capability in many image synthesis and editing applications. This paper surveys recent GAN papers regarding topics including, but not limited to, texture synthesis, image inpainting, image-to-image translation, and image editing.

Keywords:

image synthesis, image editing, constrained image synthesis, generative adversarial networks, image-to-image translation
Received: 15 November 2017 Accepted: 20 November 2017 Published: 14 December 2017 Issue date: December 2017
References(96)
[1]
Efros A. A. and Leung T. K., Texture synthesis by non-parametric sampling, in Proc. 7th IEEE Int. Conf. Computer Vision, Kerkyra, Greece, 1999, pp. 1033-1038.
[2]
Kwatra V., Schödl A., Essa I., Turk G., and Bobick A., Graphcut textures: Image and video synthesis using graph cuts, in Proc. ACM SIGGRAPH 2003 Papers, San Diego, CA, USA, 2003, pp. 277-286.
[3]
Wu Q. and Yu Y. Z., Feature matching and deformation for texture synthesis, ACM Trans. Graph., vol. 23, no. 3, pp. 364-367, 2004.
[4]
Criminisi A., Perez P., and Toyama K., Object removal by exemplar-based inpainting, in Proc. 2003 IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), Madison, WI, USA, 2003.
[5]
Komodakis N. and Tziritas G., Image completion using efficient belief propagation via priority scheduling and dynamic pruning, IEEE Trans. Image Process., vol. 16, no. 11, pp. 2649-2661, 2007.
[6]
Hays J. and Efros A. A., Scene completion using millions of photographs, ACM Trans. Graph., vol. 26, no. 3, p. 4, 2007.
[7]
Hertzmann A., Jacobs C. E., Oliver N., Curless B., and Salesin D. H., Image analogies, in Proc. 28th Annu. Conf. Computer Graphics and Interactive Techniques, Los Angeles, CA, USA, 2001, pp. 327-340.
[8]
Barnes C., Zhang F. L., Lou L. M., Wu X., and Hu S. M., PatchTable: Efficient patch queries for large datasets and applications, ACM Trans. Graph., vol. 34, no. 4, p. 97, 2015.
[9]
Fang H. and Hart J. C., Detail preserving shape deformation in image editing, in Proc. ACM SIGGRAPH 2007 Papers, San Diego, CA, USA, 2007.
[10]
Barnes C., Shechtman E., Finkelstein A., and Goldman D. B., PatchMatch: A randomized correspondence algorithm for structural image editing, ACM Trans. Graph., vol. 28, no. 3, p. 24, 2009.
[11]
Welsh T., Ashikhmin M., and Mueller K., Transferring color to greyscale images, ACM Trans. Graph., vol. 21, no. 3, pp. 277-280, 2002.
[12]
Zhu Z., Martin R. R., and Hu S. M., Panorama completion for street views, Comput. Vis. Media, vol. 1, no. 1, pp. 49-57, 2015.
[13]
Laffont P. Y., Ren Z. L., Tao X. F., Qian C., and Hays J., Transient attributes for high-level understanding and editing of outdoor scenes, ACM Trans. Graph., vol. 33, no. 4, p. 149, 2014.
[14]
Krizhevsky A., Sutskever I., and Hinton G. E., ImageNet classification with deep convolutional neural networks, in Proc. 25th Int. Conf. Neural Information Processing Systems, Lake Tahoe, NV, USA, 2012, pp. 1097-1105.
[15]
Simonyan K. and Zisserman A., Very deep convolutional networks for large-scale image recognition, in Int. Conf. Learning Representations (ICLR), San Diego, CA, USA, 2015.
[16]
He K. M., Zhang X. Y., Ren S. Q., and Sun J., Deep residual learning for image recognition, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770-778.
[17]
Ren S. Q., He K. M., Girshick R., and Sun J., Faster R-CNN: Towards real-time object detection with region proposal networks, in Proc. 28th Int. Conf. Neural Information Processing Systems 28, Montreal, Canada, 2015, pp. 91-99.
[18]
Redmon J., Divvala S., Girshick R., and Farhadi A., You only look once: Unified, real-time object detection, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 779-788.
[19]
Long J., Shelhamer E., and Darrell T., Fully convolutional networks for semantic segmentation, in Proc. 2015 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp. 3431-3440.
[20]
Ronneberger O., Fischer P., and Brox T., U-Net: Convolutional networks for biomedical image segmentation, in Proc. 18th Int. Conf. Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 2015, pp. 234-241.
[21]
Goodfellow I. J., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., and Bengio Y., Generative adversarial nets, in Proc. 27th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 2672-2680.
[22]
Kingma D. P. and Welling M., Auto-encoding variational Bayes, in Proc. 2nd Int. Conf. Learning Representations (ICLR), Ithaca, NY, USA, 2014.
[23]
Radford A., Metz L., and Chintala S., Unsupervised representation learning with deep convolutional generative adversarial networks, in Int. Conf. Learning Representations (ICLR), San Juan, Puerto Rico, 2016.
[24]
Zhao J. B., Mathieu M., and LeCun Y., Energy-based generative adversarial network, in Proc. 5th Int. Conf. Learning Representations (ICLR), Palais des Congrès Neptune, Toulon, France, 2017.
[25]
Mao X. D., Li Q., Xie H. R., Lau R. Y. K., Wang Z., and Smolley S. P., Least squares generative adversarial networks, The IEEE International Conf. Computer Vision (ICCV), Venice, Italy, 2017.
[26]
Arjovsky M., Chintala S., and Bottou L., Wasserstein generative adversarial networks, in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017.
[27]
Berthelot D., Schumm T., and Metz L., BEGAN: Boundary equilibrium generative adversarial networks, arXiv preprint arXiv: 1703.10717, 2017.
[28]
Mirza M. and Osindero S., Conditional generative adversarial nets, arXiv preprint arXiv: 1411.1784, 2014.
[29]
Denton E. L., Chintala S., Szlam A., and Fergus R., Deep generative image models using a laplacian pyramid of adversarial networks, in Proc. 28th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2015, pp. 1486-1494.
[30]
Creswell A., White T., Dumoulin V., Arulkumaran K., Sengupta B., and Bharath A. A., Generative adversarial networks: An overview, arXiv preprint arXiv: 1710.07035, 2017.
[31]
Gatys L. A., Ecker A. S., and Bethge M., Texture synthesis using convolutional neural networks, in Proc. 28th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2015, pp. 262-270.
[32]
Li C. and Wand M., Precomputed real-time texture synthesis with Markovian generative adversarial networks, in Proc. 14th European Conf. Computer Vision (ECCV), Amsterdam, The Netherlands, 2016, pp.702-716.
[33]
Gatys L. A., Ecker A. S., and Bethge M., Image style transfer using convolutional neural networks, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 2414-2423.
[34]
Jetchev N., Bergmann U., and Vollgraf R., Texture synthesis with spatial generative adversarial networks, arXiv preprint arXiv: 1611.08207, 2016.
[35]
Bergmann U., Jetchev N., and Vollgraf R., Learning texture manifolds with the periodic spatial GAN, in Proc. 34th Int Conf. Machine Learning, Sydney, Australia, 2017.
[36]
Dong C., Loy C. C., He K. M., and Tang X. O., Learning a deep convolutional network for image super-resolution, in Proc 13th European Conf. Computer Vision (ECCV), Zurich, Switzerland, 2014.
[37]
Kim J., Lee J. K., and Lee K. M., Deeply-recursive convolutional network for image super-resolution, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 1637-1645.
[38]
Shi W. Z., Caballero J., Huszar F., Totz J., Aitken A. P., Bishop R., Rueckert D., and Wang Z. H., Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 1874-1883.
[39]
Ledig C., Theis L., Huszár F., Caballero J., Cunningham A., Acosta A., Aitken A., Tejani A., Totz J., Wang Z. H., and Shi W. Z., Photo-realistic single image super-resolution using a generative adversarial network, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017.
[40]
Huang B., Chen W. H., Wu X. M., and Lin C. L., High-quality face image SR using conditional generative adversarial networks, arXiv preprint arXiv: 1707.00737, 2017.
[41]
Pathak D., Krähenbühl P., Donahue J., Darrell T., and Efros A. A., Context encoders: Feature learning by inpainting, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016.
[42]
Doersch C., Singh S., Gupta A., Sivic J., and Efros A. A., What makes Paris look like Paris? ACM Trans. Graph., vol. 31, no. 4, p. 101, 2012.
[43]
Russakovsky O., Deng J., Su H., Krause J., Satheesh S., Ma S., Huang Z. H., Karpathy A., Khosla A., and Bernstein M., et al., ImageNet large scale visual recognition challenge, Int. J. Comput. Vis., vol. 115, no. 3, pp. 211-252, 2015.
[44]
Yang C., Lu X., Lin Z., Shechtman E., Wang O., and Li H., High-resolution image inpainting using multi-scale neural patch synthesis, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017.
[45]
Iizuka S., Simo-Serra E., and Ishikawa H., Globally and locally consistent image completion, ACM Trans. Graph., vol. 36, no. 4, p. 107, 2017.
[46]
Yu F. and Koltun V., Multi-scale context aggregation by dilated convolutions, in Int. Conf. Learning Representations (ICLR), San Juan, Puerto Rico, 2016.
[47]
Yeh R. A., Chen C., Lim T. Y., Schwing A. G., Hasegawa-Johnson M., and Do M. N., Semantic image inpainting with deep generative models, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017.
[48]
Li Y. J., Liu S. F., Yang J. M., and Yang M. H., Generative face completion, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017.
[49]
Zhang Z. F., Song Y., and Qi H. R., Age progression/regression by conditional adversarial autoencoder, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017.
[50]
Antipov G., Baccouche M., and Dugelay J. L., Face aging with conditional generative adversarial networks, in IEEE Int. Conf. Image Processing, Beijing, China, 2017.
[51]
Tran L., Yin X., and Liu X. M., Disentangled representation learning GAN for pose-invariant face recognition, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017.
[52]
Yin X., Yu X., Sohn K., Liu X. M., and Chandraker M., Towards large-pose face frontalization in the wild, in Proc. IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017.
[53]
Blanz V. and Vetter T., A morphable model for the synthesis of 3D faces, in Proc. 26th Annu. Conf. Computer Graphics and Interactive Techniques, Los Angeles, CA, USA, 1999, pp. 187-194.
[54]
Huang R., Zhang S., Li T. Y., and He R., Beyond face rotation: Global and local perception GAN for photorealistic and identity preserving frontal view synthesis, in Proc. IEEE Int. Conf. Computer Vision (ICCV), Honolulu, HI, USA, 2017.
[55]
Zheng Z. D., Zheng L., and Yang Y., Unlabeled samples generated by GAN improve the person re-identification baseline in vitro, in Proc. IEEE Int. Conf. Computer Vision (ICCV), Honolulu, HI, USA, 2017.
[56]
Zhao B., Wu X., Cheng Z. Q., Liu H., and Feng J. S., Multi-view image generation from a single-view, arXiv preprint arXiv: 1704.04886, 2017.
[57]
Sohn K., Yan X. C., and Lee H., Learning structured output representation using deep conditional generative models, in Proc. 28th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2015, pp. 3483-3491.
[58]
Ma L. Q., Sun Q. R., Jia X., Schiele B., Tuytelaars T., and Van Gool L., Pose guided person image generation, arXiv preprint arXiv: 1705.09368, 2017.
[59]
Isola P., Zhu J. Y., Zhou T. H., and Efros A. A., Image-to-image translation with conditional adversarial networks, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017.
[60]
Zhu J. Y., Park T., Isola P., and Efros A. A., Unpaired image-to-image translation using cycle-consistent adversarial networks, in The IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017.
[61]
Johnson J., Alahi A., and Li F. F., Perceptual losses for real-time style transfer and super-resolution, in Proc. 14th European Conf. Computer Vision (ECCV), Amsterdam, The Netherlands, 2016.
[62]
Tung H. Y. F., Harley A. W., Seto W., and Fragkiadaki K., Adversarial inverse graphics networks: Learning 2D-to-3D lifting and image-to-image translation from unpaired supervision, in The IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017.
[63]
Reed S., Akata Z., Yan X. C., Logeswaran L., Schiele B., and Lee H., Generative adversarial text to image synthesis, in Proc. 33rd Int. Conf. Machine Learning, New York, NY, USA, 2016, pp. 1060-1069.
[64]
Reed S., Akata Z., Mohan S., Tenka S., Schiele B., and Lee H., Learning what and where to draw, in Proc. 29th Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 217-225.
[65]
Zhang H., Xu T., Li H. S., Zhang S. T., Wang X. G., Huang X. L., and Metaxas D., StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks, in The IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017.
[66]
Dash A., Gamboa J. C. B., Ahmed S., Liwicki M., and Afzal M. Z., TAC-GAN-Text conditioned auxiliary classifier generative adversarial network, arXiv preprint arXiv: 1703.06412, 2017.
[67]
Odena A., Olah C., and Shlens J., Conditional image synthesis with auxiliary classifier GANs, in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017.
[68]
Chen T., Cheng M. M., Tan P., Shamir A., and Hu S. M., Sketch2Photo: Internet image montage, ACM Trans. Graph., vol. 28, no. 5, p. 124, 2009.
[69]
Sangkloy P., Lu J. W., Fang C., Yu F., and Hays J., Scribbler: Controlling deep image synthesis with sketch and color, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017.
[70]
Xian W. Q., Sangkloy P., Lu J. W., Fang C., Yu F., and Hays J., TextureGAN: Controlling deep image synthesis with texture patches, arXiv preprint arXiv: 1706.02823, 2017.
[71]
Zhang H. and Cao X. C., Magic pencil: Generalized sketch inversion via generative adversarial nets, in Proc. SIGGRAPH ASIA 2016 Posters, Macau, China, 2016.
[72]
Liu Y. F., Qin Z. C., Luo Z. B., and Wang H., Auto-painter: Cartoon image generation from sketch by using conditional generative adversarial networks, arXiv preprint arXiv: 1705.01908, 2017.
[73]
Alexa M., Cohen-Or D., and Levin D., As-rigid-as-possible shape interpolation, in Proc. 27th Annu. Conf. Computer Graphics and Interactive Techniques, New Orleans, LA, USA, 2000, pp. 157-164.
[74]
Avidan S. and Shamir A., Seam carving for content-aware image resizing, ACM Trans. Graph., vol. 26, no. 3, p. 10, 2007.
[75]
Levin A., Lischinski D., and Weiss Y., Colorization using optimization, in Proc. ACM SIGGRAPH 2004 Papers, Los Angeles, CA, USA, 2004, pp. 689-694.
[76]
Li X. J., Zhao H. L., Nie G. Z., and Huang H., Image recoloring using geodesic distance based color harmonization, Comput. Vis. Media, vol. 1, no. 2, pp. 143-155, 2015.
[77]
Lu S. P., Dauphin G., Lafruit G., and Munteanu A., Color retargeting: Interactive time-varying color image composition from time-lapse sequences, Comput. Vis. Media, vol. 1, no. 4, pp. 321-330, 2015.
[78]
Pérez P., Gangnet M., and Blake A., Poisson image editing, in Proc. ACM SIGGRAPH 2003 Papers, San Diego, CA, USA, 2003, pp. 313-318.
[79]
Farbman Z., Hoffer G., Lipman Y., Cohen-Or D., and Lischinski D., Coordinates for instant image cloning, in Proc. ACM SIGGRAPH 2009 Papers, New Orleans, LA, USA, 2009.
[80]
Zhu J. Y., Krähenbühl P., Shechtman E., and Efros A. A., Generative visual manipulation on the natural image manifold, in Proc. 14th European Conf. Computer Vision (ECCV), Amsterdam, The Netherlands, 2016.
[81]
Brock A., Lim T., Ritchie J. M., and Weston N., Neural photo editing with introspective adversarial networks, in Int. Conf. Learning Representations (ICLR), Palais des Congrès Neptune, Toulon, France, 2017.
[82]
Cao Y., Zhou Z. M., Zhang W. N., and Yu Y., Unsupervised diverse colorization via generative adversarial networks, arXiv preprint arXiv: 1702.06674, 2017.
[83]
Wu H. K., Zheng S., Zhang J. G., and Huang K. Q., GP-GAN: Towards realistic high-resolution image blending, arXiv preprint arXiv: 1703.07195, 2017.
[84]
Vondrick C., Pirsiavash H., and Torralba A., Generating videos with scene dynamics, in Proc. 29th Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 613-621.
[85]
Saito M., Matsumoto E., and Saito S., Temporal generative adversarial nets with singular value clipping, in The IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017.
[86]
Tulyakov S., Liu M. Y., Yang X. D., and Kautz J., MoCoGAN: Decomposing motion and content for video generation, arXiv preprint arXiv: 1707.04993, 2017.
[87]
Mathieu M., Couprie C., and LeCun Y., Deep multi-scale video prediction beyond mean square error, in Int. Conf. Learning Representations (ICLR), San Juan, Puerto Rico, 2016.
[88]
Zhou Y. P. and Berg T. L., Learning temporal transformations from time-lapse videos, in Proc. 14th European Conf. Computer Vision (ECCV), Amsterdam, The Netherlands, 2016.
[89]
Vondrick C. and Torralba A., Generating the future with adversarial transformers, in The IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 1020-1028.
[90]
Liang X. D., Lee L., Dai W., and Xing E. P., Dual motion GAN for future-flow embedded video prediction, in The IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017.
[91]
Chen Q. F. and Koltun V., Photographic image synthesis with cascaded refinement networks, in The Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017.
[92]
Van Den Oord A., Kalchbrenner N., and Kavukcuoglu K., Pixel recurrent neural networks, in Proc. 33rd Int. Conf. Machine Learning, New York, NY, USA, 2016.
[93]
Wu J. J., Zhang C. K., Xue T. F., Freeman W. T., and Tenenbaum J. B., Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling, in Proc. 30th Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 82-90.
[94]
Smith E. J. and Meger D., Improved adversarial systems for 3D object generation and reconstruction, in Proc. 1st Conf. Robot Learning, Mountain View, CA, USA, 2017, pp. 87-96.
[95]
Wang W. Y., Huang Q. G., You S. Y., Yang C., and Neumann U., Shape inpainting using 3D generative adversarial network and recurrent convolutional networks, in The IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017, pp. 2298-2306.
[96]
Yang B., Wen H. K., Wang S., Clark R., Markham A., and Trigoni N., 3D object reconstruction from a single depth view with adversarial learning, in Int. Conf. Computer Vision Workshops (ICCVW), 2017, pp. 679-688.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 15 November 2017
Accepted: 20 November 2017
Published: 14 December 2017
Issue date: December 2017

Copyright

© The author(s) 2017

Acknowledgements

This work was supported by the National Key Technology R&D Program (No. 2016YFB1001402), the National Natural Science Foundation of China (No. 61521002), the Joint NSFC-ISF Research Program (No. 61561146393), and Research Grant of Beijing Higher Institution Engineering Research Center and Tsinghua-Tencent Joint Laboratory for Internet Innovation Technology. This work was also supported by the EPSRC CDE (No. EP/L016540/1).

Rights and permissions

Return