OpenCodePapers

text-to-video-generation-on-msr-vtt

Text-to-Video Generation

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	FVD	CLIPSIM	CLIP-FID	FID	ModelName	ReleaseDate
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis		104.0	0.2793	9.35		Snap Video (512x288)	2024-02-22
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis		110.4	0.2793	8.48		Snap Video (288×288)	2024-02-22
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization	✓ Link	188.36	0.3012		11.27	Video-LaVIT	2024-02-05
VideoPoet: A Large Language Model for Zero-Shot Video Generation		213	0.3123			VideoPoet	2023-12-21
Make Pixels Dance: High-Dynamic Video Generation		381	0.3125			PixelDance	2023-11-18
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation	✓ Link	406	0.2947		8.60	HiGen	2023-12-07
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos	✓ Link	441	0.2991		8.19	TF-T2V	2023-12-25
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation	✓ Link	538	0.3072		13.08	Show-1	2023-09-27
ModelScope Text-to-Video Technical Report	✓ Link	550	0.2930		11.09	ModelScopeT2V	2023-08-12
VideoComposer: Compositional Video Synthesis with Motion Controllability	✓ Link	580	0.2932			VideoComposer	2023-06-03
MagicVideo: Efficient Video Generation With Latent Diffusion Models		998			36.5	MagicVideo	2022-11-20
Make-A-Video: Text-to-Video Generation without Text-Video Data	✓ Link		0.3049	13.17	13.17	Make-A-Video	2022-09-29
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models	✓ Link		0.2929			Video LDM	2023-04-18
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation	✓ Link		0.2644		23.4	MMVG	2022-11-23
Make-A-Video: Text-to-Video Generation without Text-Video Data	✓ Link		0.2631	23.59	23.59	CogVideo (English)	2022-09-29
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models	✓ Link		0.2614	24.78		CogVideo (Chinese)	2023-04-18
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion	✓ Link		0.2439	47.68	47.68	NUWA	2021-11-24
GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions	✓ Link		0.2402			GODIVA	2021-04-30