Paper | Code | FAD | FD | KLD | ModelName | ReleaseDate |
---|---|---|---|---|---|---|
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis | ✓ Link | 0.79 | 5.22 | MMAudio-S-16kHz | 2024-12-19 | |
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models | ✓ Link | 0.841 | 24.168 | V2A-Mapper | 2023-08-18 | |
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis | ✓ Link | 0.97 | 4.72 | MMAudio-L-44.1kHz | 2024-12-19 | |
Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching | ✓ Link | 1.32 | 12.26 | Frieren | 2024-06-01 | |
Temporally Aligned Audio for Video with Autoregression | ✓ Link | 1.92 | V-AURA | 2024-09-20 | ||
Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity | 2.04 | MaskVAT_Hybrid | 2024-07-15 | |||
Read, Watch and Scream! Sound Generation from Text and Video | ✓ Link | 2.16 | 15.24 | ReWas | 2024-07-08 | |
Tell What You Hear From What You See -- Video to Audio Generation Through Text | ✓ Link | 2.38 | 1.41 | VATT-LLama | 2024-11-08 |