OpenCodePapers

text-to-image-generation-on-coco

Text-to-Image Generation
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeFIDInception scoreFID-1FID-2FID-4FID-8SOA-CZero shot FIDModelNameReleaseDate
Data Extrapolation for Text-to-image Generation on Small Datasets✓ Link5.00RAT-Diffusion2024-10-02
Re-Imagen: Retrieval-Augmented Text-to-Image Generator5.25Re-Imagen (Finetuned)2022-09-29
All are Worth Words: A ViT Backbone for Diffusion Models✓ Link5.48U-ViT-S/2-Deep2022-09-25
GLIGEN: Open-Set Grounded Text-to-Image Generation✓ Link5.61GLIGEN (fine-tuned, Detection + Caption data)2023-01-17
GLIGEN: Open-Set Grounded Text-to-Image Generation✓ Link5.82GLIGEN (fine-tuned, Detection data only)2023-01-17
All are Worth Words: A ViT Backbone for Diffusion Models✓ Link5.95U-ViT-S/22022-09-25
Improving Diffusion-Based Image Synthesis with Context Prediction6.216.21ConPreDiff2024-01-04
Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Auto-Encoders✓ Link6.29TLDM2022-02-19
GLIGEN: Open-Set Grounded Text-to-Image Generation✓ Link6.38GLIGEN (fine-tuned, Grounding data)2023-01-17
RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths✓ Link6.61RAPHAEL (zero-shot)2023-05-29
ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts✓ Link6.75ERNIE-ViLG 2.0 (zero-shot)2022-10-27
Re-Imagen: Retrieval-Augmented Text-to-Image Generator6.88Re-Imagen2022-09-29
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers✓ Link6.95eDiff-I (zero-shot)2022-11-02
Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image Generation7.2131.46Swinv2-Imagen2022-10-18
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding✓ Link7.27Imagen (zero-shot)2022-05-23
Scaling up GANs for Text-to-Image Synthesis✓ Link7.28GigaGAN (Zero-shot, 64x64)2023-03-09
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis✓ Link7.3StyleGAN-T (Zero-shot, 64x64)2023-01-23
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors✓ Link7.55Make-a-Scene (unfiltered)2022-03-24
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion✓ Link8.03Kandinsky2023-10-05
LAFITE: Towards Language-Free Training for Text-to-Image Generation✓ Link8.1232.3461.09Lafite2021-11-27
Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation✓ Link8.158.15SiD-LSG (Data-free distillation, zero-shot FID)2024-06-03
Simple diffusion: End-to-end diffusion for high resolution images✓ Link8.3simple diffusion (U-ViT)2023-01-26
Scaling up GANs for Text-to-Image Synthesis✓ Link9.09GigaGAN (Zero-shot, 256x256)2023-03-09
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion✓ Link9.330.5XMC-GAN (256 x 256)2021-11-24
Cross-Modal Contrastive Learning for Text-to-Image Generation✓ Link9.33XMC-GAN2021-01-12
Hierarchical Text-Conditional Image Generation with CLIP Latents✓ Link10.39DALL-E 22022-04-13
Shifted Diffusion for Text-to-image Generation✓ Link10.6Corgi-Semi2022-11-24
Shifted Diffusion for Text-to-image Generation✓ Link10.88Corgi2022-11-24
TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation✓ Link10.9TR0N (StyleGAN-XL, LAION2BCLIP, BLIP-2, zero-shot)2023-04-26
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors✓ Link11.84Make-a-Scene (unfiltered)2022-03-24
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models✓ Link12.24GLIDE (zero-shot)2021-12-20
KNN-Diffusion: Image Generation via Large-Scale Retrieval12.5KNN-Diffusion2022-04-06
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis✓ Link12.54GALIP (CC12m)2023-01-30
High-Resolution Image Synthesis with Latent Diffusion Models✓ Link12.63Latent Diffusion (LDM-KL-8-G)2021-12-20
Retrieval-Augmented Multimodal Language Modeling12.63Stable Diffusion2022-11-22
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion✓ Link12.9 27.2NÜWA (256 x 256)2021-11-24
Vector Quantized Diffusion Model for Text-to-Image Synthesis✓ Link13.86VQ-Diffusion-F2021-11-29
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis✓ Link13.9StyleGAN-T (Zero-shot, 256x256)2023-01-23
Recurrent Affine Transformation for Text-to-image Synthesis✓ Link14.6RAT-GAN2022-04-22
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation✓ Link14.7ERNIE-ViLG2021-12-31
Retrieval-Augmented Multimodal Language Modeling15.7RA-CM3 (2.7B)2022-11-22
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers✓ Link17.7CogView2(6B, Finetuned)2022-04-28
Vector Quantized Diffusion Model for Text-to-Image Synthesis✓ Link19.75VQ-Diffusion-B2021-11-29
Improving Text-to-Image Synthesis Using Contrastive Learning✓ Link20.7933.34DM-GAN+CL2021-07-06
FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization✓ Link21.1634.26FuseDream (few-shot, k=5)2021-12-02
FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization✓ Link21.1634.26FuseDream (k=5, 256)2021-12-02
FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization✓ Link21.8934.67FuseDream (k=10, 256)2021-12-02
Improving Text-to-Image Synthesis Using Contrastive Learning✓ Link23.9325.70AttnGAN+CL2021-07-06
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers✓ Link24CogView2(6B, Finetuned)2022-04-28
Semantic Object Accuracy for Generative Text-to-Image Synthesis✓ Link24.7027.8835.85OP-GAN2019-10-29
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion✓ Link 26.032.2DM-GAN (256 x 256)2021-11-24
LAFITE: Towards Language-Free Training for Text-to-Image Generation✓ Link26.9426.0222.9718.7015.7214.79Lafite (zero-shot)2021-11-27
CogView: Mastering Text-to-Image Generation via Transformers✓ Link27.118.219.413.919.423.6CogView2021-05-26
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion✓ Link 27.118.2CogView (256 x 256)2021-11-24
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion✓ Link27.517.9DALL-E (256 x 256)2021-11-24
Retrieval-Augmented Multimodal Language Modeling28DALL-E (12B)2022-11-22
VICTR: Visual Information Captured Text Representation for Text-to-Image Multimodal Tasks✓ Link29.2628.18AttnGAN + VICTR2020-10-07
Retrieval-Augmented Multimodal Language Modeling29.5Vanilla CM32022-11-22
VICTR: Visual Information Captured Text Representation for Text-to-Image Multimodal Tasks✓ Link32.3732.37DM-GAN + VICTR2020-10-07
DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis✓ Link32.6430.4933.44DM-GAN2019-04-02
Generating Multiple Objects at Spatially Distinct Locations✓ Link33.3524.7625.46AttnGAN + OP2019-01-03
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion✓ Link 35.223.3AttnGAN (256 x 256)2021-11-24
L-Verse: Bidirectional Generation Between Image and Text✓ Link37.231.625.721.421.1L-Verse-CC2021-11-22
L-Verse: Bidirectional Generation Between Image and Text✓ Link45.841.935.530.229.83L-Verse2021-11-22
Generating Multiple Objects at Spatially Distinct Locations✓ Link55.3012.12StackGAN + OP2019-01-03
StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks✓ Link74.058.45StackGAN-v12017-10-19
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion✓ Link18.7DF-GAN (256 x 256)2021-11-24
VICTR: Visual Information Captured Text Representation for Text-to-Image Multimodal Tasks✓ Link10.38StackGAN + VICTR2020-10-07
ChatPainter: Improving Text to Image Generation using Dialogue9.74ChatPainter2018-02-22