MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models | ✓ Link | 1.12 | | 22.65 | 0.89 | | | | MeLFusion (image-conditioned) | 2024-06-07 |
FLUX that Plays Music | ✓ Link | 1.43 | | | 1.25 | 2.98 | | | FLUXMusic | 2024-09-01 |
Quality-aware Masked Diffusion Transformer for Enhanced Music Generation | ✓ Link | 1.65 | | | 1.31 | 2.80 | | | OpenMusic (QA-MDT) | 2024-05-24 |
ETTA: Elucidating the Design Space of Text-to-Audio Models | ✓ Link | 1.91 | 92.18 | 10.06 | 0.84 | 3.32 | 0.51 | 0.53 | ETTA | 2024-12-26 |
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models | ✓ Link | 2.00 | | | 1.29 | | | | JEN-1 | 2023-08-09 |
Noise2Music: Text-conditioned Music Generation with Diffusion Models | | 2.134 | | | | | | | Noise2Music waveform | 2023-02-08 |
Improving Text-To-Audio Models with Synthetic Captions | ✓ Link | 2.21 | 270.32 | 22.69 | 0.94 | 2.79 | 0.51 | 0.43 | TANGO-AF | 2024-06-18 |
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining | ✓ Link | 2.93 | 190.16 | 16.34 | 1.00 | 2.59 | 0.48 | 0.47 | AudioLDM2-large | 2023-08-10 |
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining | ✓ Link | 3.13 | | | 1.20 | | | | AudioLDM 2-Full | 2023-08-10 |
Simple and Controllable Music Generation | ✓ Link | 3.4 | | | 1.23 | | | | MusicGen w/o melody (1.5B) | 2023-06-08 |
Stable Audio Open | ✓ Link | 3.51 | 127.20 | 36.42 | 1.32 | 2.93 | 0.48 | 0.49 | Stable Audio Open | 2024-07-19 |
UniAudio: An Audio Foundation Model Toward Universal Audio Generation | ✓ Link | 3.65 | | | 1.87 | | | | UniAudio | 2023-10-01 |
Simple and Controllable Music Generation | ✓ Link | 3.8 | 197.12 | | 1.31 | | | | MusicGen w/o melody (3.3B) | 2023-06-08 |
Noise2Music: Text-conditioned Music Generation with Diffusion Models | | 3.840 | | | | | | | Noise2Music spectrogram | 2023-02-08 |
MusicLM: Generating Music From Text | ✓ Link | 4.0 | | | | | | | MusicLM | 2023-01-26 |
Simple and Controllable Music Generation | ✓ Link | 5.0 | | | 1.31 | | | | MusicGen w/ random melody (1.5B) | 2023-06-08 |
Efficient Neural Music Generation | | 5.41 | | | | | | | MeLoDy | 2023-05-25 |
MusicLM: Generating Music From Text | ✓ Link | 9.6 | | | | | | | Mubert | 2023-01-26 |
MusicLM: Generating Music From Text | ✓ Link | 13.4 | | | | | | | Riffusion | 2023-01-26 |
Fast Timing-Conditioned Latent Audio Diffusion | ✓ Link | | 108.69 | | 0.80 | | | | Stable Audio | 2024-02-07 |
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining | ✓ Link | | 354.05 | | 1.53 | | | | AudioLDM2-music | 2023-08-10 |