video-generation-on-ucf-101

Video Generation

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	FVD16	FVD128	Inception Score	KVD16	ModelName	ReleaseDate
Photorealistic Video Generation with Diffusion Models		36±2				W.A.L.T-XL (class-conditional)	2023-12-11
Video-GPT via Next Clip Diffusion	✓ Link	53				Video-GPT	2025-05-18
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior	✓ Link	57				LARP	2024-10-28
Long-Context Autoregressive Video Modeling with Next-Frame Prediction	✓ Link	57				FAR	2025-03-25
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation	✓ Link	58±3				MAGVIT-v2	2023-10-09
Hierarchical Patch Diffusion Models for High-Resolution Video Generation		66.32		87.68		HPDM-L	2024-06-12
MAGVIT: Masked Generative Video Transformer	✓ Link	76±2		89.27±0.15		MAGVIT (-L-CG, 128x128, class-conditional)	2022-12-10
Make-A-Video: Text-to-Video Generation without Text-Video Data	✓ Link	81.25		82.55		Make-A-Video (Finetuning, 256x256, class-conditional)	2022-09-29
ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer	✓ Link	90				ACDiT	2024-12-10
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation	✓ Link	109				MAGVIT-v2 (AR)	2023-10-09
REGIS: Refining Generated Videos via Iterative Stylistic Redesigning	✓ Link	141				REGIS-Fuse (Finetuning, 128x128, text-conditional)	2023-11-03
MAGVIT: Masked Generative Video Transformer	✓ Link	159±2		83.55±0.14		MAGVIT (-B-CG, 128x128, class-conditional)	2022-12-10
LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models	✓ Link	164.45				Latte + LeanVAE	2025-03-18
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation	✓ Link	173		80.03		VideoFusion (128x128, class-conditional)	2023-03-15
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation	✓ Link	191				OmniTokenizer-AR	2024-06-13
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation	✓ Link	220		72.22		VideoFusion (128x128, unconditional)	2023-03-15
Make Pixels Dance: High-Dynamic Video Generation		242.82		42.10		PixelDance (256x256, text-conditional)	2023-11-18
Photorealistic Video Generation with Diffusion Models		258.1		35.1		W.A.L.T 3B (text-conditional)	2023-12-11
MAGVIT: Masked Generative Video Transformer	✓ Link	265				MAGVIT (AR)	2022-12-10
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization	✓ Link	280.57		44.26		Video-LaVIT	2024-02-05
VIDM: Video Implicit Diffusion Models	✓ Link	294.7	1531.9			VIDM (256x256, unconditional)	2022-12-01
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers	✓ Link	305		51.11		CogVideo (128x128, class-conditional)	2022-05-29
Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models		310		60.01		PYoCo (Zero-shot, 64x64, unconditional)	2023-05-17
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation	✓ Link	328		73.7		MMVG (128x128, class-conditional)	2022-11-23
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer	✓ Link	332		79.28		TATS (128x128, class-conditional)	2022-04-07
Lumiere: A Space-Time Diffusion Model for Video Generation	✓ Link	332.49		37.54		Lumiere (Zero-shot. 1024x1024, text-conditional)	2024-01-23
Grid Diffusion Models for Text-to-Video Generation		340.0		62.88		GridDiff (Zero-shot)	2024-03-30
MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing	✓ Link	346.84		48.01		VideoAssembler (Zero-shot, 256x256, class-conditional)	2023-11-29
VideoPoet: A Large Language Model for Zero-Shot Video Generation		355		38.44		VideoPoet (text-conditional)	2023-12-21
Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models		355.19		47.76		PYoCo (Zero-shot, 64x64, text-conditional)	2023-05-17
Make-A-Video: Text-to-Video Generation without Text-Video Data	✓ Link	367.23		33		Make-A-Video (Zero-shot, 256x256, class-conditional)	2022-09-29
Latent Video Diffusion Models for High-Fidelity Long Video Generation	✓ Link	372			27	LVDM (256x256, unconditional)	2022-11-23
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation	✓ Link	395		58.3		MMVG (128x128, unconditional)	2022-11-23
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer	✓ Link	420		57.63		TATS (128x128, unconditional)	2022-04-07
Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers	✓ Link	438	968	65.93		MeBT (128x128, unconditional)	2023-03-20
Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks	✓ Link	465		59.68	39.6	DIGAN (128x128, class-conditional)	2022-02-21
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models	✓ Link	526.30				LAVIE (320x512, text-conditional)	2023-09-26
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models	✓ Link	550.61		33.45		Video LDM (320x512, text-conditional)	2023-04-18
Latent Video Diffusion Models for High-Fidelity Long Video Generation	✓ Link	552			42	LVDM (256x256, unconditional)	2022-11-23
Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks	✓ Link	577		32.70		DIGAN (128x128, unconditional)	2022-02-21
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer	✓ Link	635			55	TATS (256x256)	2022-04-07
MagicVideo: Efficient Video Generation With Latent Diffusion Models		699				MagicVideo (256x256, text-conditional)	2022-11-20
A Good Image Generator Is What You Need for High-Resolution Video Synthesis	✓ Link	700		33.95		MoCoGAN-HD (256x256, unconditional)	2021-04-30
MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation	✓ Link	1143				MCVD (64x64)	2022-05-19
Latent Video Diffusion Models for High-Fidelity Long Video Generation	✓ Link	1209				TGAN-v2 (128x128)	2022-11-23
Latent Video Diffusion Models for High-Fidelity Long Video Generation	✓ Link	1396			116	VDM	2022-11-23
Latent Video Diffusion Models for High-Fidelity Long Video Generation	✓ Link	2460			148	MCVD	2022-11-23
FIFO-Diffusion: Generating Infinite Videos from Text without Training	✓ Link		596.64	74.44		FIFO-Diffusion	2024-05-19

OpenCodePapers

video-generation-on-ucf-101