OpenCodePapers

audio-generation-on-audiocaps

Audio Generation
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeFD_openl3FADFDKL_passtISCLAP_LAIONCLAP_MSModelNameReleaseDate
ETTA: Elucidating the Design Space of Text-to-Audio Models✓ Link61.792.0310.101.1314.290.600.43ETTA-FT-AC-100k2024-12-26
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization✓ Link75.11.1512.20.488TangoFlux2024-12-30
Stable Audio Open✓ Link78.242.140.350.34Stable Audio Open2024-07-19
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization✓ Link79.71.2310.70.438TangoFlux-base2024-12-30
ETTA: Elucidating the Design Space of Text-to-Audio Models✓ Link80.132.5113.121.2214.360.540.43ETTA2024-12-26
Fast Timing-Conditioned Latent Audio Diffusion✓ Link103.662.890.41Stable Audio2024-02-07
Long-form music generation with latent diffusion✓ Link110.622.70Stable Audio 2.02024-04-16
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining✓ Link158.042.0226.181.688.550.530.37AudioLDM2-large2023-08-10
AudioGen: Textually Guided Audio Generation✓ Link185.533.131.42AudioGen2022-09-30
Audiobox: Unified Audio Generation with Natural Language Prompts0.778.3012.700.71Audiobox Sound2023-12-25
Taming Data and Transformers for Audio Generation✓ Link1.2116.510.668GenAu-Large2024-06-27
Retrieval-Augmented Text-to-Audio Generation1.37Re-AudioLDM-L2023-09-14
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining✓ Link1.420.243AudioLDM 2-AC-Large2023-08-10
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model✓ Link1.5924.52TANGO2023-04-24
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation✓ Link1.6321.99Auffusion2024-01-02
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation✓ Link1.7623.08Auffusion-Full2024-01-02
Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation✓ Link1.8011.75Make-An-Audio 22023-05-29
Any-to-Any Generation via Composable Diffusion✓ Link1.8022.90CoDi2023-05-19
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models✓ Link1.9623.31AudioLDM-L-Full2023-01-29
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation✓ Link2.1820.44Consistency TTA (Single-step generation)2023-09-19
Improving Text-To-Audio Models with Synthetic Captions✓ Link2.5417.1911.040.527Tango-AF&AC-FT-AC2024-06-18
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models✓ Link2.6618.32Make-An-Audio2023-01-30
Diffsound: Discrete Diffusion Model for Text-to-sound Generation✓ Link7.7547.68Diffsound2022-07-20