OpenCodePapers

speech-synthesis-on-libritts

Accented Speech RecognitionSpeech Synthesis

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	PESQ	M-STFT	MCD	Periodicity	V/UV F1	ModelName	ReleaseDate
Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization	✓ Link	4.454	0.7358		0.0528	0.9756	PeriodWave-Turbo-L	2024-08-15
BigVGAN: A Universal Neural Vocoder with Large-Scale Training	✓ Link	4.362	0.7026	0.2903	0.0593	0.9793	BigVGAN-v2	2022-06-09
EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks	✓ Link	4.3536	0.7982		0.0751	0.9745	EVA-GAN-big	2024-01-31
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation	✓ Link	4.248	1.0269		0.0765	0.9651	PeriodWave + FreeU	2024-08-14
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction	✓ Link	4.228			0.090	0.968	RFWave	2024-03-08
BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network	✓ Link	4.120	0.7992	0.4129	0.0924	0.9644	BigVSAN (w/ snakebeta)	2023-09-06
BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network	✓ Link	4.116	0.7881	0.3381	0.0935	0.9635	BigVSAN	2023-09-06
EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks	✓ Link	4.0330	0.9485		0.0942	0.9658	EVA-GAN-base	2024-01-31
BigVGAN: A Universal Neural Vocoder with Large-Scale Training	✓ Link	4.027	0.7997	0.3745	0.1018	0.9598	BigVGAN	2022-06-09
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis	✓ Link	3.70			0.101	0.9582	Vocos	2023-06-01
BigVGAN: A Universal Neural Vocoder with Large-Scale Training	✓ Link	3.519	0.8788	0.4564	0.1287	0.9459	BigVGAN-base	2022-06-09
WaveGlow: A Flow-based Generative Network for Speech Synthesis	✓ Link	3.138	1.3099	2.3591	0.1485	0.9378	WaveGlow	2018-10-31
WaveFlow: A Compact Flow-based Model for Raw Audio	✓ Link	3.027	1.1120	1.2455	0.1416	0.9410	WaveFlow	2019-12-03
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis	✓ Link	2.947	1.0017	0.6603	0.1565	0.9300	HiFi-GAN	2020-10-12
Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions	✓ Link	1.701	2.2358	1.8854	0.3044	0.8144	SC-WaveRNN	2020-08-09