text-to-image-generation-on-coco

Text-to-Image Generation

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	FID	Inception score	FID-1	FID-2	FID-4	FID-8	SOA-C	Zero shot FID	ModelName	ReleaseDate
Data Extrapolation for Text-to-image Generation on Small Datasets	✓ Link	5.00								RAT-Diffusion	2024-10-02
Re-Imagen: Retrieval-Augmented Text-to-Image Generator		5.25								Re-Imagen (Finetuned)	2022-09-29
All are Worth Words: A ViT Backbone for Diffusion Models	✓ Link	5.48								U-ViT-S/2-Deep	2022-09-25
GLIGEN: Open-Set Grounded Text-to-Image Generation	✓ Link	5.61								GLIGEN (fine-tuned, Detection + Caption data)	2023-01-17
GLIGEN: Open-Set Grounded Text-to-Image Generation	✓ Link	5.82								GLIGEN (fine-tuned, Detection data only)	2023-01-17
All are Worth Words: A ViT Backbone for Diffusion Models	✓ Link	5.95								U-ViT-S/2	2022-09-25
Improving Diffusion-Based Image Synthesis with Context Prediction		6.21							6.21	ConPreDiff	2024-01-04
Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Auto-Encoders	✓ Link	6.29								TLDM	2022-02-19
GLIGEN: Open-Set Grounded Text-to-Image Generation	✓ Link	6.38								GLIGEN (fine-tuned, Grounding data)	2023-01-17
RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths	✓ Link	6.61								RAPHAEL (zero-shot)	2023-05-29
ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts	✓ Link	6.75								ERNIE-ViLG 2.0 (zero-shot)	2022-10-27
Re-Imagen: Retrieval-Augmented Text-to-Image Generator		6.88								Re-Imagen	2022-09-29
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers	✓ Link	6.95								eDiff-I (zero-shot)	2022-11-02
Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image Generation		7.21	31.46							Swinv2-Imagen	2022-10-18
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding	✓ Link	7.27								Imagen (zero-shot)	2022-05-23
Scaling up GANs for Text-to-Image Synthesis	✓ Link	7.28								GigaGAN (Zero-shot, 64x64)	2023-03-09
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis	✓ Link	7.3								StyleGAN-T (Zero-shot, 64x64)	2023-01-23
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors	✓ Link	7.55								Make-a-Scene (unfiltered)	2022-03-24
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion	✓ Link	8.03								Kandinsky	2023-10-05
LAFITE: Towards Language-Free Training for Text-to-Image Generation	✓ Link	8.12	32.34					61.09		Lafite	2021-11-27
Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation	✓ Link	8.15							8.15	SiD-LSG (Data-free distillation, zero-shot FID)	2024-06-03
Simple diffusion: End-to-end diffusion for high resolution images	✓ Link	8.3								simple diffusion (U-ViT)	2023-01-26
Scaling up GANs for Text-to-Image Synthesis	✓ Link	9.09								GigaGAN (Zero-shot, 256x256)	2023-03-09
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion	✓ Link	9.3	30.5							XMC-GAN (256 x 256)	2021-11-24
Cross-Modal Contrastive Learning for Text-to-Image Generation	✓ Link	9.33								XMC-GAN	2021-01-12
Hierarchical Text-Conditional Image Generation with CLIP Latents	✓ Link	10.39								DALL-E 2	2022-04-13
Shifted Diffusion for Text-to-image Generation	✓ Link	10.6								Corgi-Semi	2022-11-24
Shifted Diffusion for Text-to-image Generation	✓ Link	10.88								Corgi	2022-11-24
TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation	✓ Link	10.9								TR0N (StyleGAN-XL, LAION2BCLIP, BLIP-2, zero-shot)	2023-04-26
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors	✓ Link	11.84								Make-a-Scene (unfiltered)	2022-03-24
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models	✓ Link	12.24								GLIDE (zero-shot)	2021-12-20
KNN-Diffusion: Image Generation via Large-Scale Retrieval		12.5								KNN-Diffusion	2022-04-06
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis	✓ Link	12.54								GALIP (CC12m)	2023-01-30
High-Resolution Image Synthesis with Latent Diffusion Models	✓ Link	12.63								Latent Diffusion (LDM-KL-8-G)	2021-12-20
Retrieval-Augmented Multimodal Language Modeling		12.63								Stable Diffusion	2022-11-22
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion	✓ Link	12.9	27.2							NÜWA (256 x 256)	2021-11-24
Vector Quantized Diffusion Model for Text-to-Image Synthesis	✓ Link	13.86								VQ-Diffusion-F	2021-11-29
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis	✓ Link	13.9								StyleGAN-T (Zero-shot, 256x256)	2023-01-23
Recurrent Affine Transformation for Text-to-image Synthesis	✓ Link	14.6								RAT-GAN	2022-04-22
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation	✓ Link	14.7								ERNIE-ViLG	2021-12-31
Retrieval-Augmented Multimodal Language Modeling		15.7								RA-CM3 (2.7B)	2022-11-22
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers	✓ Link	17.7								CogView2(6B, Finetuned)	2022-04-28
Vector Quantized Diffusion Model for Text-to-Image Synthesis	✓ Link	19.75								VQ-Diffusion-B	2021-11-29
Improving Text-to-Image Synthesis Using Contrastive Learning	✓ Link	20.79	33.34							DM-GAN+CL	2021-07-06
FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization	✓ Link	21.16	34.26							FuseDream (few-shot, k=5)	2021-12-02
FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization	✓ Link	21.16	34.26							FuseDream (k=5, 256)	2021-12-02
FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization	✓ Link	21.89	34.67							FuseDream (k=10, 256)	2021-12-02
Improving Text-to-Image Synthesis Using Contrastive Learning	✓ Link	23.93	25.70							AttnGAN+CL	2021-07-06
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers	✓ Link	24								CogView2(6B, Finetuned)	2022-04-28
Semantic Object Accuracy for Generative Text-to-Image Synthesis	✓ Link	24.70	27.88					35.85		OP-GAN	2019-10-29
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion	✓ Link	26.0	32.2							DM-GAN (256 x 256)	2021-11-24
LAFITE: Towards Language-Free Training for Text-to-Image Generation	✓ Link	26.94	26.02	22.97	18.70	15.72	14.79			Lafite (zero-shot)	2021-11-27
CogView: Mastering Text-to-Image Generation via Transformers	✓ Link	27.1	18.2	19.4	13.9	19.4	23.6			CogView	2021-05-26
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion	✓ Link	27.1	18.2							CogView (256 x 256)	2021-11-24
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion	✓ Link	27.5	17.9							DALL-E (256 x 256)	2021-11-24
Retrieval-Augmented Multimodal Language Modeling		28								DALL-E (12B)	2022-11-22
VICTR: Visual Information Captured Text Representation for Text-to-Image Multimodal Tasks	✓ Link	29.26	28.18							AttnGAN + VICTR	2020-10-07
Retrieval-Augmented Multimodal Language Modeling		29.5								Vanilla CM3	2022-11-22
VICTR: Visual Information Captured Text Representation for Text-to-Image Multimodal Tasks	✓ Link	32.37	32.37							DM-GAN + VICTR	2020-10-07
DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis	✓ Link	32.64	30.49					33.44		DM-GAN	2019-04-02
Generating Multiple Objects at Spatially Distinct Locations	✓ Link	33.35	24.76					25.46		AttnGAN + OP	2019-01-03
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion	✓ Link	35.2	23.3							AttnGAN (256 x 256)	2021-11-24
L-Verse: Bidirectional Generation Between Image and Text	✓ Link	37.2		31.6	25.7	21.4	21.1			L-Verse-CC	2021-11-22
L-Verse: Bidirectional Generation Between Image and Text	✓ Link	45.8		41.9	35.5	30.2	29.83			L-Verse	2021-11-22
Generating Multiple Objects at Spatially Distinct Locations	✓ Link	55.30	12.12							StackGAN + OP	2019-01-03
StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks	✓ Link	74.05	8.45							StackGAN-v1	2017-10-19
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion	✓ Link		18.7							DF-GAN (256 x 256)	2021-11-24
VICTR: Visual Information Captured Text Representation for Text-to-Image Multimodal Tasks	✓ Link		10.38							StackGAN + VICTR	2020-10-07
ChatPainter: Improving Text to Image Generation using Dialogue			9.74							ChatPainter	2018-02-22

OpenCodePapers

text-to-image-generation-on-coco