A Comprehensive Pipeline for Complex Text-to-Image Synthesis

Fei Fang; Fei Luo; Hong-Pan Zhang; Hua-Jian Zhou; Alix L. H. Chow; Chun-Xia Xiao

doi:10.1007/s11390-020-0305-9

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Regular Paper

A Comprehensive Pipeline for Complex Text-to-Image Synthesis

Fei Fang^¹, Fei Luo^¹, Hong-Pan Zhang^¹, Hua-Jian Zhou^¹, Alix L. H. Chow^², Chun-Xia Xiao^¹(

)

School of Computer Science, Wuhan University, Wuhan 430072, China

Xiaomi Technology Co. LTD, Beijing 100085, China

Show Author Information

Abstract

Synthesizing a complex scene image with multiple objects and background according to text description is a challenging problem. It needs to solve several difficult tasks across the fields of natural language processing and computer vision. We model it as a combination of semantic entity recognition, object retrieval and recombination, and objects’ status optimization. To reach a satisfactory result, we propose a comprehensive pipeline to convert the input text to its visual counterpart. The pipeline includes text processing, foreground objects and background scene retrieval, image synthesis using constrained MCMC, and post-processing. Firstly, we roughly divide the objects parsed from the input text into foreground objects and background scenes. Secondly, we retrieve the required foreground objects from the foreground object dataset segmented from Microsoft COCO dataset, and retrieve an appropriate background scene image from the background image dataset extracted from the Internet. Thirdly, in order to ensure the rationality of foreground objects’ positions and sizes in the image synthesis step, we design a cost function and use the Markov Chain Monte Carlo (MCMC) method as the optimizer to solve this constrained layout problem. Finally, to make the image look natural and harmonious, we further use Poisson-based and relighting-based methods to blend foreground objects and background scene image in the post-processing step. The synthesized results and comparison results based on Microsoft COCO dataset prove that our method outperforms some of the state-of-the-art methods based on generative adversarial networks (GANs) in visual quality of generated scene images.

Keywords

image synthesis scene generation text-to-image conversion Markov Chain Monte Carlo (MCMC)

Electronic Supplementary Material

Download File(s)

jcst-35-3-522-Highlights.pdf (884.4 KB)

References

【1】

Crossref Google Scholar

Journal of Computer Science and Technology

Volume 35 Issue 3,
May 2020

Pages 522-537

DOI: 10.1007/s11390-020-0305-9

	{{item.num}}
{{version.versionName}} Author Response
{{version.versionName}} Review comment

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Cite this Report

. . , , {{reviewData.reportCite.doi}}

Cite this article:

Fang F, Luo F, Zhang H-P, et al. A Comprehensive Pipeline for Complex Text-to-Image Synthesis. Journal of Computer Science and Technology, 2020, 35(3): 522-537. https://doi.org/10.1007/s11390-020-0305-9

861

Views

Crossref

N/A

Web of Science

Scopus

CSCD

Google Scholar
Citation

Received: 15 January 2020

Revised: 15 April 2020

Published: 29 May 2020