Open Access Issue
Feature-Grounded Single-Stage Text-to-Image Generation
Tsinghua Science and Technology 2024, 29 (2): 469-480
Published: 22 September 2023
Abstract PDF (18.7 MB) Collect

Recently, Generative Adversarial Networks (GANs) have become the mainstream text-to-image (T2I) framework. However, a standard normal distribution noise of inputs cannot provide sufficient information to synthesize an image that approaches the ground-truth image distribution. Moreover, the multistage generation strategy results in complex T2I applications. Therefore, this study proposes a novel feature-grounded single-stage T2I model, which considers the "real" distribution learned from training images as one input and introduces a worst-case-optimized similarity measure into the loss function to enhance the model’s generation capacity. Experimental results on two benchmark datasets demonstrate the competitive performance of the proposed model in terms of the Frechet inception distance and inception score compared to those of some classical and state-of-the-art models, showing the improved similarities among the generated image, text, and ground truth.

Total 1