MAGVIT: Masked Generative Video Transformer | ✓ Link | 62 | | | | 1 | 15 | 15 | | MAGVIT | 2022-12-10 |
Diffusion Models for Video Prediction and Infilling | ✓ Link | 84.20 | | | | 1 | 20 | 15 | | RaMViD | 2022-06-15 |
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion | ✓ Link | 86.9 | | | | 1 | 15 | 15 | | NUWA | 2021-11-24 |
MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation | ✓ Link | 87.9 | 0.838 | 19.1 | | 2 | 5 | 14 | | MCVD : c2t5p14 | 2022-05-19 |
MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation | ✓ Link | 89.5 | 0.78 | 16.9 | | 1 | 5 | 15 | | MCVD : c1t5p15 | 2022-05-19 |
FitVid: Overfitting in Pixel-Level Video Prediction | ✓ Link | 93.6 | | | | 1 | 15 | 15 | Uses 100 times more fake than real samples (atypical) | FitVid | 2021-06-24 |
Scaling Autoregressive Video Models | ✓ Link | 94± 2 | | | | 1 | 15 | 15 | FVD on only leftmost samples is 94, FVD on unrolled (all subsequences) is 96 | Video Transformer | 2019-06-06 |
CCVS: Context-aware Controllable Video Synthesis | ✓ Link | 99 ± 2 | | | | 1 | 15 | 15 | | CCVS | 2021-07-16 |
VideoGPT: Video Generation using VQ-VAE and Transformers | ✓ Link | 103.3 | | | | 1 | 15 | 15 | | VideoGPT | 2021-04-20 |
Transformation-based Adversarial Video Prediction on Large-Scale Data | | 103.3 | | | | 1 | 15 | 15 | | TrIVD-GAN-FP | 2020-03-09 |
Adversarial Video Generation on Complex Datasets | ✓ Link | 109.8 | | | | 1 | 15 | 15 | | DVD-GAN-FP | 2019-07-15 |
Stochastic Adversarial Video Prediction | ✓ Link | 116.4 | | | | 2 | 14 | 14 | | SAVP (from FVD) | 2018-04-04 |
MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation | ✓ Link | 118.4 | 0.745 | 16.2 | | 2 | 5 | 28 | | MCVD : c2t5p28 | 2022-05-19 |
Latent Video Transformer | ✓ Link | 125.76±2.90 | | | | 1 | 15 | 15 | | LVT | 2020-06-18 |
VideoFlow: A Conditional Flow-Based Model for Stochastic Video Generation | ✓ Link | 131±5 | | | | 3 | 10 | 14 (total 16) | | VideoFlow | 2019-03-04 |
Improved Conditional VRNNs for Video Prediction | ✓ Link | 143.4 | 0.822±0.06 | | 0.055±0.03 | 2 | 10 | 28 | | Hier-VRNN | 2019-04-27 |
Stochastic Adversarial Video Prediction | ✓ Link | 143.43 | 0.795±0.07 | | 0.062±0.03 | 2 | 10 | 28 | | SAVP (from vRNN) | 2018-04-04 |
Improved Conditional VRNNs for Video Prediction | ✓ Link | 149.22 | 0.829±0.06 | | 0.058±0.03 | 2 | 10 | 28 | | VRNN 1L | 2019-04-27 |
Stochastic Adversarial Video Prediction | ✓ Link | 152±9 | 0.7887±0.0092 | 18.44±0.25 | 0.0634±0.0026 | 2 | 12 | 28 | | SAVP (from SRVP) | 2018-04-04 |
Exploring Spatial-Temporal Multi-Frequency Analysis for High-Fidelity and Temporal-Consistency Video Prediction | ✓ Link | 159.6 | 0.844 | 21.02 | 0.0936 | 2 | 14 | 28 | | WAM | 2020-02-23 |
Stochastic Latent Residual Video Prediction | ✓ Link | 162 ± 4 | 0.8196±0.0084 | 19.59±0.27 | 0.0574±0.0032 | 2 | 12 | 28 | | SRVP | 2020-02-21 |
SLAMP: Stochastic Latent Appearance and Motion Prediction | ✓ Link | 245 ± 5 | 0.8175±0.084 | 19.67±0.26 | 0.0596±0.0032 | 2 | 10 | 28 | | SLAMP | 2021-08-05 |
Stochastic Video Generation with a Learned Prior | ✓ Link | 255±4 | 0.8058±0.0088 | 18.95±0.26 | 0.0609±0.0034 | 2 | 12 | 28 | | SVG (from SRVP) | 2018-02-21 |
Stochastic Video Generation with a Learned Prior | ✓ Link | 256.62 | 0.816±0.07 | | 0.061±0.03 | 2 | 10 | 28 | | SVG-LP (from vRNN) | 2018-02-21 |
Stochastic Variational Video Prediction | ✓ Link | 262.5 | | | | 2 | 14 | 14 | | SV2P (from FVD) | 2017-10-30 |
Unsupervised Learning for Physical Interaction through Video Prediction | ✓ Link | 296.5 | | | | 2 | 14 | 14 | | CDNA (from FVD) | 2016-05-23 |
Stochastic Video Generation with a Learned Prior | ✓ Link | 315.5 | | | | 2 | 14 | 14 | | SVG-FP (from FVD) | 2018-02-21 |
Latent Video Transformer | ✓ Link | 320.9 | | | | 1 | 15 | 15 | | Baseline (from LVT) | 2020-06-18 |
MoCoGAN: Decomposing Motion and Content for Video Generation | ✓ Link | 503 | | | | 4 | 12 | 12 | | MoCoGAN | 2017-07-17 |
Stochastic Variational Video Prediction | ✓ Link | 965±17 | 0.8169±0.0086 | 20.39±0.27 | 0.0912±0.0053 | 2 | 12 | 28 | | SV2P (from SRVP) | 2017-10-30 |
Stochastic Adversarial Video Prediction | ✓ Link | | 0.815 | 19.09 | | 2 | 14 | 28 | | SAVP-VAE (from WAM) | 2018-04-04 |