image-generation-on-imagenet-256x256

Image Generation

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	FID	Inception score	NFE	ModelName	ReleaseDate
Unified Continuous Generative Models	✓ Link	1.06		80	SiT-XL/2 + UCGM-S (E2E-VAE + 40 sampling steps + CFG)	2025-05-12
Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator	✓ Link	1.21		50	EDM2-L + DDO (SD-VAE, 25 steps, DPM-Solver-v3)	2025-03-03
Unified Continuous Generative Models	✓ Link	1.21		30	UCGM-XL/2 (VA-VAE + 30 sampling steps, without guidance)	2025-05-12
Unified Continuous Generative Models	✓ Link	1.21		40	UCGM-XL/2 (E2E-VAE + 40 sampling steps, without guidance)	2025-05-12
Unified Continuous Generative Models	✓ Link	1.21		100	LightningDiT + UCGM-S (VA-VAE + 50 sampling steps + CFG)	2025-05-12
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation	✓ Link	1.24			xAR-H	2025-02-27
DDT: Decoupled Diffusion Transformer	✓ Link	1.26	310.6		DDT-XL/2(22en6de 675M + guidance interval )	2025-04-08
REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers	✓ Link	1.26	314.9		SiT-XL/2 + REPA-E	2025-04-15
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation	✓ Link	1.28			xAR-L	2025-02-27
Generative Modeling with Explicit Memory	✓ Link	1.32			GMem (with the guidance interval)	2024-12-11
Flow-Anchored Consistency Models	✓ Link	1.32		2	FACM (2-step)	2025-07-04
Diffusion Models without Classifier-free Guidance	✓ Link	1.34			SiT-XL/2 + MG	2025-02-17
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models	✓ Link	1.35			LightningDiT + VA-VAE (with the guidance interval)	2025-01-02
AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model	✓ Link	1.35	318.8		AliTok-XL, autoregressive, 662M	2025-06-05
Simpler Diffusion (SiD2): 1.5 FID on ImageNet512 with pixel-space diffusion		1.38			SiD2	2024-10-25
U-REPA: Aligning Diffusion U-Nets to ViTs	✓ Link	1.41			SiT↓-XL/2+U-REPA (with the guidance interval)	2025-03-24
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think	✓ Link	1.42			SiT-XL/2 + REPA (with the guidance interval)	2024-10-09
AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model	✓ Link	1.42	326.6		AliTok-XL, autoregressive, 318M	2025-06-05
Randomized Autoregressive Visual Generation	✓ Link	1.48			RAR-XXL, autoregressive	2024-11-01
Randomized Autoregressive Visual Generation	✓ Link	1.50			RAR-XL, autoregressive	2024-11-01
MaskBit: Embedding-free Image Generation via Bit Tokens	✓ Link	1.52			MaskBit	2024-09-24
Generative Modeling with Explicit Memory	✓ Link	1.53			GMem (w/o guidance)	2024-12-11
Elucidating the design space of language models for image generation	✓ Link	1.54			ELM	2024-10-21
Autoregressive Image Generation without Vector Quantization	✓ Link	1.55			MAR-H, Diff Loss	2024-06-17
PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher	✓ Link	1.56			PaGoDA	2024-05-23
Efficient Diffusion Training via Min-SNR Weighting Strategy	✓ Link	1.57			ViT-XL/2 with limited Interval Guidance	2023-03-16
MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer	✓ Link	1.58			MDTv2	2023-03-25
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves	✓ Link	1.58			SiT-XL + SRA	2025-05-05
Robust Latent Matters: Boosting Image Generation with Sampling Error	✓ Link	1.60			RobustTok-L	2025-03-11
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models and Time-Dependent Layer Normalization	✓ Link	1.63			DiMR-G/2R	2024-06-13
FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching	✓ Link	1.65			FlowAR	2024-12-19
CADS: Unleashing the Diversity of Diffusion Models through Condition-Annealed Sampling		1.70			DiT-XL/2 with CADS	2023-10-26
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models and Time-Dependent Layer Normalization	✓ Link	1.70			DiMR-XL/2R	2024-06-13
Randomized Autoregressive Visual Generation	✓ Link	1.70			RAR-L, autoregressive	2024-11-01
Flow-Anchored Consistency Models	✓ Link	1.70		1	FACM (1-step)	2025-07-04
DiffiT: Diffusion Vision Transformers for Image Generation	✓ Link	1.73			DiffiT	2023-12-04
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction	✓ Link	1.73			VAR (Visual Autoregressive)	2024-04-03
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation	✓ Link	1.78			MAGVIT-v2	2023-10-09
Autoregressive Image Generation without Vector Quantization	✓ Link	1.78			MAR-L, Diff Loss	2024-06-17
MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer	✓ Link	1.79			MDT	2023-03-25
Refining Generative Process with Discriminator Guidance in Score-based Diffusion Models	✓ Link	1.83			Discriminator Guidance	2022-11-28
Diffusion Models Need Visual Priors for Image Generation		1.83			DoD-XL	2024-10-11
Robust Latent Matters: Boosting Image Generation with Sampling Error	✓ Link	1.83			RobustTok-B	2025-03-11
Autoregressive Image Generation with Randomized Parallel Decoding	✓ Link	1.94			ARPG-XXL	2025-03-13
Randomized Autoregressive Visual Generation	✓ Link	1.95			RAR-B, autoregressive	2024-11-01
An Image is Worth 32 Tokens for Reconstruction and Generation	✓ Link	1.97			TiTok-S-128	2024-06-11
PixelFlow: Pixel-Space Generative Models with Flow	✓ Link	1.98			PixelFlow	2025-04-10
Relay Diffusion: Unifying diffusion process across resolutions for image synthesis	✓ Link	1.99			RDM	2023-09-04
FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification	✓ Link	2.03			FasterDiT-XL/2	2024-10-14
Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling	✓ Link	2.05	338.08		LEGO-XL	2023-10-10
Autoregressive Image Generation with Randomized Parallel Decoding	✓ Link	2.1			ARPG-XL	2025-03-13
SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer	✓ Link	2.14			StyleSAN-XL	2023-01-30
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation	✓ Link	2.18			LlamaGen	2024-06-10
Scalable Diffusion Models with Transformers	✓ Link	2.27			DiT-XL/2	2022-12-19
StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets	✓ Link	2.30			StyleGAN-XL	2022-02-01
Autoregressive Image Generation without Vector Quantization	✓ Link	2.31			MAR-B, Diff Loss	2024-06-17
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation	✓ Link	2.33			Open-MAGVIT2-XL	2024-09-06
ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer	✓ Link	2.37			ACDiT	2024-12-10
Autoregressive Image Generation with Randomized Parallel Decoding	✓ Link	2.44			ARPG-L	2025-03-13
An Image is Worth 32 Tokens for Reconstruction and Generation	✓ Link	2.48			TiTok-B-64	2024-06-11
GIVT: Generative Infinite-Vocabulary Transformers	✓ Link	2.59			GIVT-Causal-L+A	2023-12-04
[]()		2.74			Patch Diffusion
An Image is Worth 32 Tokens for Reconstruction and Generation	✓ Link	2.77			TiTok-B-32	2024-06-11
Diffusion Models Need Visual Priors for Image Generation		2.79			DoD-B	2024-10-11
Polynomial Implicit Neural Representations For Large Diverse Datasets	✓ Link	2.86			Poly-INR	2023-03-20
MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization	✓ Link	3.02	294.1		MGVQ	2025-07-14
Refining Generative Process with Discriminator Guidance in Score-based Diffusion Models	✓ Link	3.18			ADM-G++ (FID)	2022-11-28
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective	✓ Link	3.39	205.96		DiGIT-0.7B	2024-10-16
Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer		3.41			Contextual RQ-Transformer	2022-06-09
Scaling up GANs for Text-to-Image Synthesis	✓ Link	3.45			GigaGAN	2023-03-09
Return of Unconditional Generation: A Self-supervised Representation Generation Method	✓ Link	3.49			RCG-L (w/o guidance)	2023-12-06
BIGRoC: Boosting Image Generation via a Robust Classifier	✓ Link	3.63			BIGRoC-gt (Guided-Diffusion)	2021-08-08
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation	✓ Link	3.65			MAGVIT-v2 (w/o guidance)	2023-10-09
BIGRoC: Boosting Image Generation via a Robust Classifier	✓ Link	3.69			BIGRoC-pl (Guided-Diffusion)	2021-08-08
Simple diffusion: End-to-end diffusion for high resolution images	✓ Link	3.71			simple diffusion (U-Net)	2023-01-26
Simple diffusion: End-to-end diffusion for high resolution images	✓ Link	3.75			simple diffusion (U-ViT, L)	2023-01-26
Autoregressive Image Generation using Residual Quantization	✓ Link	3.83			RQ-Transformer	2022-03-03
Diffusion Models Beat GANs on Image Synthesis	✓ Link	3.94			ADM-G, ADM-U	2021-05-11
Entropy-driven Sampling and Training Scheme for Conditional Diffusion Generation	✓ Link	3.96			ADM-G + EDS (ED-DPM, classifier_scale=0.75)	2022-06-23
MaskGIT: Masked Generative Image Transformer	✓ Link	4.02			MaskGIT (a=0.05)	2022-02-08
Entropy-driven Sampling and Training Scheme for Conditional Diffusion Generation	✓ Link	4.09			ADM-G + EDS + ECT (ED-DPM, classifier_scale=1.0)	2022-06-23
[]()		4.29			LDM
Refining Generative Process with Discriminator Guidance in Score-based Diffusion Models	✓ Link	4.45			ADM-G++ (Recall)	2022-11-28
Flow Matching in Latent Space	✓ Link	4.46			LFM	2023-07-17
Scalable Adaptive Computation for Iterative Generation	✓ Link	4.51			RIN	2022-12-22
Diffusion Models Beat GANs on Image Synthesis	✓ Link	4.59			ADM-G	2021-05-11
Cascaded Diffusion Models for High Fidelity Image Generation		4.88			CDM	2021-05-30
Taming Transformers for High-Resolution Image Synthesis	✓ Link	5.2			VQGAN+Transformer (k=600, p=1.0, a=0.05)	2020-12-17
MaskGIT: Masked Generative Image Transformer	✓ Link	6.18			MaskGIT	2022-02-08
Taming Transformers for High-Resolution Image Synthesis	✓ Link	6.59			VQGAN+Transformer (k=mixed, p=1.0, a=0.005)	2020-12-17
Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values	✓ Link	6.82			Polarity-BigGAN	2022-03-03
Large Scale GAN Training for High Fidelity Natural Image Synthesis	✓ Link	8.1			BigGAN-deep	2018-09-28
[]()		11.84			ADM
Improved Denoising Diffusion Probabilistic Models	✓ Link	12.3			Improved DDPM	2021-02-18

OpenCodePapers

image-generation-on-imagenet-256x256