Unified Continuous Generative Models | ✓ Link | 1.06 | | 80 | SiT-XL/2 + UCGM-S (E2E-VAE + 40 sampling steps + CFG) | 2025-05-12 |
Unified Continuous Generative Models | ✓ Link | 1.21 | | 30 | UCGM-XL/2 (VA-VAE + 30 sampling steps, without guidance) | 2025-05-12 |
Unified Continuous Generative Models | ✓ Link | 1.21 | | 40 | UCGM-XL/2 (E2E-VAE + 40 sampling steps, without guidance) | 2025-05-12 |
Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator | ✓ Link | 1.21 | | 50 | EDM2-L + DDO (SD-VAE, 25 steps, DPM-Solver-v3) | 2025-03-03 |
Unified Continuous Generative Models | ✓ Link | 1.21 | | 100 | LightningDiT + UCGM-S (VA-VAE + 50 sampling steps + CFG) | 2025-05-12 |
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation | ✓ Link | 1.24 | | | xAR-H | 2025-02-27 |
REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers | ✓ Link | 1.26 | 314.9 | | SiT-XL/2 + REPA-E | 2025-04-15 |
DDT: Decoupled Diffusion Transformer | ✓ Link | 1.26 | 310.6 | | DDT-XL/2(22en6de 675M + guidance interval ) | 2025-04-08 |
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation | ✓ Link | 1.28 | | | xAR-L | 2025-02-27 |
Flow-Anchored Consistency Models | ✓ Link | 1.32 | | 2 | FACM (2-step) | 2025-07-04 |
Generative Modeling with Explicit Memory | ✓ Link | 1.32 | | | GMem (with the guidance interval) | 2024-12-11 |
Diffusion Models without Classifier-free Guidance | ✓ Link | 1.34 | | | SiT-XL/2 + MG | 2025-02-17 |
AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model | ✓ Link | 1.35 | 318.8 | | AliTok-XL, autoregressive, 662M | 2025-06-05 |
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models | ✓ Link | 1.35 | | | LightningDiT + VA-VAE (with the guidance interval) | 2025-01-02 |
Simpler Diffusion (SiD2): 1.5 FID on ImageNet512 with pixel-space diffusion | | 1.38 | | | SiD2 | 2024-10-25 |
U-REPA: Aligning Diffusion U-Nets to ViTs | ✓ Link | 1.41 | | | SiT↓-XL/2+U-REPA (with the guidance interval) | 2025-03-24 |
AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model | ✓ Link | 1.42 | 326.6 | | AliTok-XL, autoregressive, 318M | 2025-06-05 |
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think | ✓ Link | 1.42 | | | SiT-XL/2 + REPA (with the guidance interval) | 2024-10-09 |
Randomized Autoregressive Visual Generation | ✓ Link | 1.48 | | | RAR-XXL, autoregressive | 2024-11-01 |
Randomized Autoregressive Visual Generation | ✓ Link | 1.50 | | | RAR-XL, autoregressive | 2024-11-01 |
MaskBit: Embedding-free Image Generation via Bit Tokens | ✓ Link | 1.52 | | | MaskBit | 2024-09-24 |
Generative Modeling with Explicit Memory | ✓ Link | 1.53 | | | GMem (w/o guidance) | 2024-12-11 |
Elucidating the design space of language models for image generation | ✓ Link | 1.54 | | | ELM | 2024-10-21 |
Autoregressive Image Generation without Vector Quantization | ✓ Link | 1.55 | | | MAR-H, Diff Loss | 2024-06-17 |
PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher | ✓ Link | 1.56 | | | PaGoDA | 2024-05-23 |
Efficient Diffusion Training via Min-SNR Weighting Strategy | ✓ Link | 1.57 | | | ViT-XL/2 with limited Interval Guidance | 2023-03-16 |
MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer | ✓ Link | 1.58 | | | MDTv2 | 2023-03-25 |
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves | ✓ Link | 1.58 | | | SiT-XL + SRA | 2025-05-05 |
Robust Latent Matters: Boosting Image Generation with Sampling Error | ✓ Link | 1.60 | | | RobustTok-L | 2025-03-11 |
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models and Time-Dependent Layer Normalization | ✓ Link | 1.63 | | | DiMR-G/2R | 2024-06-13 |
FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching | ✓ Link | 1.65 | | | FlowAR | 2024-12-19 |
Flow-Anchored Consistency Models | ✓ Link | 1.70 | | 1 | FACM (1-step) | 2025-07-04 |
CADS: Unleashing the Diversity of Diffusion Models through Condition-Annealed Sampling | | 1.70 | | | DiT-XL/2 with CADS | 2023-10-26 |
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models and Time-Dependent Layer Normalization | ✓ Link | 1.70 | | | DiMR-XL/2R | 2024-06-13 |
Randomized Autoregressive Visual Generation | ✓ Link | 1.70 | | | RAR-L, autoregressive | 2024-11-01 |
DiffiT: Diffusion Vision Transformers for Image Generation | ✓ Link | 1.73 | | | DiffiT | 2023-12-04 |
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction | ✓ Link | 1.73 | | | VAR (Visual Autoregressive) | 2024-04-03 |
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation | ✓ Link | 1.78 | | | MAGVIT-v2 | 2023-10-09 |
Autoregressive Image Generation without Vector Quantization | ✓ Link | 1.78 | | | MAR-L, Diff Loss | 2024-06-17 |
MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer | ✓ Link | 1.79 | | | MDT | 2023-03-25 |
Refining Generative Process with Discriminator Guidance in Score-based Diffusion Models | ✓ Link | 1.83 | | | Discriminator Guidance | 2022-11-28 |
Diffusion Models Need Visual Priors for Image Generation | | 1.83 | | | DoD-XL | 2024-10-11 |
Robust Latent Matters: Boosting Image Generation with Sampling Error | ✓ Link | 1.83 | | | RobustTok-B | 2025-03-11 |
Autoregressive Image Generation with Randomized Parallel Decoding | ✓ Link | 1.94 | | | ARPG-XXL | 2025-03-13 |
Randomized Autoregressive Visual Generation | ✓ Link | 1.95 | | | RAR-B, autoregressive | 2024-11-01 |
An Image is Worth 32 Tokens for Reconstruction and Generation | ✓ Link | 1.97 | | | TiTok-S-128 | 2024-06-11 |
PixelFlow: Pixel-Space Generative Models with Flow | ✓ Link | 1.98 | | | PixelFlow | 2025-04-10 |
Relay Diffusion: Unifying diffusion process across resolutions for image synthesis | ✓ Link | 1.99 | | | RDM | 2023-09-04 |
FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification | ✓ Link | 2.03 | | | FasterDiT-XL/2 | 2024-10-14 |
Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling | ✓ Link | 2.05 | 338.08 | | LEGO-XL | 2023-10-10 |
Autoregressive Image Generation with Randomized Parallel Decoding | ✓ Link | 2.1 | | | ARPG-XL | 2025-03-13 |
SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer | ✓ Link | 2.14 | | | StyleSAN-XL | 2023-01-30 |
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation | ✓ Link | 2.18 | | | LlamaGen | 2024-06-10 |
Scalable Diffusion Models with Transformers | ✓ Link | 2.27 | | | DiT-XL/2 | 2022-12-19 |
StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets | ✓ Link | 2.30 | | | StyleGAN-XL | 2022-02-01 |
Autoregressive Image Generation without Vector Quantization | ✓ Link | 2.31 | | | MAR-B, Diff Loss | 2024-06-17 |
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation | ✓ Link | 2.33 | | | Open-MAGVIT2-XL | 2024-09-06 |
ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer | ✓ Link | 2.37 | | | ACDiT | 2024-12-10 |
Autoregressive Image Generation with Randomized Parallel Decoding | ✓ Link | 2.44 | | | ARPG-L | 2025-03-13 |
An Image is Worth 32 Tokens for Reconstruction and Generation | ✓ Link | 2.48 | | | TiTok-B-64 | 2024-06-11 |
GIVT: Generative Infinite-Vocabulary Transformers | ✓ Link | 2.59 | | | GIVT-Causal-L+A | 2023-12-04 |
[]() | | 2.74 | | | Patch Diffusion | |
An Image is Worth 32 Tokens for Reconstruction and Generation | ✓ Link | 2.77 | | | TiTok-B-32 | 2024-06-11 |
Diffusion Models Need Visual Priors for Image Generation | | 2.79 | | | DoD-B | 2024-10-11 |
Polynomial Implicit Neural Representations For Large Diverse Datasets | ✓ Link | 2.86 | | | Poly-INR | 2023-03-20 |
MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization | ✓ Link | 3.02 | 294.1 | | MGVQ | 2025-07-14 |
Refining Generative Process with Discriminator Guidance in Score-based Diffusion Models | ✓ Link | 3.18 | | | ADM-G++ (FID) | 2022-11-28 |
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective | ✓ Link | 3.39 | 205.96 | | DiGIT-0.7B | 2024-10-16 |
Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer | | 3.41 | | | Contextual RQ-Transformer | 2022-06-09 |
Scaling up GANs for Text-to-Image Synthesis | ✓ Link | 3.45 | | | GigaGAN | 2023-03-09 |
Return of Unconditional Generation: A Self-supervised Representation Generation Method | ✓ Link | 3.49 | | | RCG-L (w/o guidance) | 2023-12-06 |
BIGRoC: Boosting Image Generation via a Robust Classifier | ✓ Link | 3.63 | | | BIGRoC-gt (Guided-Diffusion) | 2021-08-08 |
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation | ✓ Link | 3.65 | | | MAGVIT-v2 (w/o guidance) | 2023-10-09 |
BIGRoC: Boosting Image Generation via a Robust Classifier | ✓ Link | 3.69 | | | BIGRoC-pl (Guided-Diffusion) | 2021-08-08 |
Simple diffusion: End-to-end diffusion for high resolution images | ✓ Link | 3.71 | | | simple diffusion (U-Net) | 2023-01-26 |
Simple diffusion: End-to-end diffusion for high resolution images | ✓ Link | 3.75 | | | simple diffusion (U-ViT, L) | 2023-01-26 |
Autoregressive Image Generation using Residual Quantization | ✓ Link | 3.83 | | | RQ-Transformer | 2022-03-03 |
Diffusion Models Beat GANs on Image Synthesis | ✓ Link | 3.94 | | | ADM-G, ADM-U | 2021-05-11 |
Entropy-driven Sampling and Training Scheme for Conditional Diffusion Generation | ✓ Link | 3.96 | | | ADM-G + EDS (ED-DPM, classifier_scale=0.75) | 2022-06-23 |
MaskGIT: Masked Generative Image Transformer | ✓ Link | 4.02 | | | MaskGIT (a=0.05) | 2022-02-08 |
Entropy-driven Sampling and Training Scheme for Conditional Diffusion Generation | ✓ Link | 4.09 | | | ADM-G + EDS + ECT (ED-DPM, classifier_scale=1.0) | 2022-06-23 |
[]() | | 4.29 | | | LDM | |
Refining Generative Process with Discriminator Guidance in Score-based Diffusion Models | ✓ Link | 4.45 | | | ADM-G++ (Recall) | 2022-11-28 |
Flow Matching in Latent Space | ✓ Link | 4.46 | | | LFM | 2023-07-17 |
Scalable Adaptive Computation for Iterative Generation | ✓ Link | 4.51 | | | RIN | 2022-12-22 |
Diffusion Models Beat GANs on Image Synthesis | ✓ Link | 4.59 | | | ADM-G | 2021-05-11 |
Cascaded Diffusion Models for High Fidelity Image Generation | | 4.88 | | | CDM | 2021-05-30 |
Taming Transformers for High-Resolution Image Synthesis | ✓ Link | 5.2 | | | VQGAN+Transformer (k=600, p=1.0, a=0.05) | 2020-12-17 |
MaskGIT: Masked Generative Image Transformer | ✓ Link | 6.18 | | | MaskGIT | 2022-02-08 |
Taming Transformers for High-Resolution Image Synthesis | ✓ Link | 6.59 | | | VQGAN+Transformer (k=mixed, p=1.0, a=0.005) | 2020-12-17 |
Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values | ✓ Link | 6.82 | | | Polarity-BigGAN | 2022-03-03 |
Large Scale GAN Training for High Fidelity Natural Image Synthesis | ✓ Link | 8.1 | | | BigGAN-deep | 2018-09-28 |
[]() | | 11.84 | | | ADM | |
Improved Denoising Diffusion Probabilistic Models | ✓ Link | 12.3 | | | Improved DDPM | 2021-02-18 |