OpenCodePapers

video-to-sound-generation-on-vgg-sound

Audio GenerationVideo-to-Sound Generation
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeFADFDKLDModelNameReleaseDate
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis✓ Link0.795.22MMAudio-S-16kHz2024-12-19
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models✓ Link0.84124.168V2A-Mapper2023-08-18
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis✓ Link0.974.72MMAudio-L-44.1kHz2024-12-19
Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching✓ Link1.3212.26Frieren2024-06-01
Temporally Aligned Audio for Video with Autoregression✓ Link1.92V-AURA2024-09-20
Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity2.04MaskVAT_Hybrid2024-07-15
Read, Watch and Scream! Sound Generation from Text and Video✓ Link2.1615.24ReWas2024-07-08
Tell What You Hear From What You See -- Video to Audio Generation Through Text✓ Link2.381.41VATT-LLama2024-11-08