OpenCodePapers

video-generation-on-ucf-101

Video Generation
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeFVD16FVD128Inception ScoreKVD16ModelNameReleaseDate
Photorealistic Video Generation with Diffusion Models36±2W.A.L.T-XL (class-conditional)2023-12-11
Video-GPT via Next Clip Diffusion✓ Link53Video-GPT2025-05-18
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior✓ Link57LARP2024-10-28
Long-Context Autoregressive Video Modeling with Next-Frame Prediction✓ Link57FAR2025-03-25
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation✓ Link58±3MAGVIT-v22023-10-09
Hierarchical Patch Diffusion Models for High-Resolution Video Generation66.3287.68HPDM-L2024-06-12
MAGVIT: Masked Generative Video Transformer✓ Link76±289.27±0.15MAGVIT (-L-CG, 128x128, class-conditional)2022-12-10
Make-A-Video: Text-to-Video Generation without Text-Video Data✓ Link81.2582.55Make-A-Video (Finetuning, 256x256, class-conditional)2022-09-29
ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer✓ Link90ACDiT2024-12-10
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation✓ Link109MAGVIT-v2 (AR)2023-10-09
REGIS: Refining Generated Videos via Iterative Stylistic Redesigning✓ Link141REGIS-Fuse (Finetuning, 128x128, text-conditional)2023-11-03
MAGVIT: Masked Generative Video Transformer✓ Link159±283.55±0.14MAGVIT (-B-CG, 128x128, class-conditional)2022-12-10
LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models✓ Link164.45Latte + LeanVAE2025-03-18
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation✓ Link17380.03VideoFusion (128x128, class-conditional)2023-03-15
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation✓ Link191OmniTokenizer-AR2024-06-13
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation✓ Link22072.22VideoFusion (128x128, unconditional)2023-03-15
Make Pixels Dance: High-Dynamic Video Generation242.8242.10PixelDance (256x256, text-conditional)2023-11-18
Photorealistic Video Generation with Diffusion Models258.135.1W.A.L.T 3B (text-conditional)2023-12-11
MAGVIT: Masked Generative Video Transformer✓ Link265MAGVIT (AR)2022-12-10
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization✓ Link280.5744.26Video-LaVIT2024-02-05
VIDM: Video Implicit Diffusion Models✓ Link294.71531.9VIDM (256x256, unconditional)2022-12-01
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers✓ Link30551.11CogVideo (128x128, class-conditional)2022-05-29
Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models31060.01PYoCo (Zero-shot, 64x64, unconditional)2023-05-17
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation✓ Link32873.7MMVG (128x128, class-conditional)2022-11-23
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer✓ Link33279.28TATS (128x128, class-conditional)2022-04-07
Lumiere: A Space-Time Diffusion Model for Video Generation✓ Link332.4937.54Lumiere (Zero-shot. 1024x1024, text-conditional)2024-01-23
Grid Diffusion Models for Text-to-Video Generation340.062.88GridDiff (Zero-shot)2024-03-30
MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing✓ Link346.8448.01VideoAssembler (Zero-shot, 256x256, class-conditional)2023-11-29
VideoPoet: A Large Language Model for Zero-Shot Video Generation35538.44VideoPoet (text-conditional)2023-12-21
Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models355.1947.76PYoCo (Zero-shot, 64x64, text-conditional)2023-05-17
Make-A-Video: Text-to-Video Generation without Text-Video Data✓ Link367.2333Make-A-Video (Zero-shot, 256x256, class-conditional)2022-09-29
Latent Video Diffusion Models for High-Fidelity Long Video Generation✓ Link37227LVDM (256x256, unconditional)2022-11-23
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation✓ Link39558.3MMVG (128x128, unconditional)2022-11-23
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer✓ Link42057.63TATS (128x128, unconditional)2022-04-07
Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers✓ Link43896865.93MeBT (128x128, unconditional)2023-03-20
Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks✓ Link46559.6839.6DIGAN (128x128, class-conditional)2022-02-21
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models✓ Link526.30LAVIE (320x512, text-conditional)2023-09-26
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models✓ Link550.6133.45Video LDM (320x512, text-conditional)2023-04-18
Latent Video Diffusion Models for High-Fidelity Long Video Generation✓ Link55242LVDM (256x256, unconditional)2022-11-23
Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks✓ Link57732.70DIGAN (128x128, unconditional)2022-02-21
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer✓ Link63555TATS (256x256)2022-04-07
MagicVideo: Efficient Video Generation With Latent Diffusion Models699MagicVideo (256x256, text-conditional)2022-11-20
A Good Image Generator Is What You Need for High-Resolution Video Synthesis✓ Link70033.95MoCoGAN-HD (256x256, unconditional)2021-04-30
MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation✓ Link1143MCVD (64x64)2022-05-19
Latent Video Diffusion Models for High-Fidelity Long Video Generation✓ Link1209TGAN-v2 (128x128)2022-11-23
Latent Video Diffusion Models for High-Fidelity Long Video Generation✓ Link1396116VDM2022-11-23
Latent Video Diffusion Models for High-Fidelity Long Video Generation✓ Link2460148MCVD2022-11-23
FIFO-Diffusion: Generating Infinite Videos from Text without Training✓ Link596.6474.44FIFO-Diffusion2024-05-19