speech-recognition-on-librispeech-test-clean

Speech Recognition

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Word Error Rate (WER)	ModelName	ReleaseDate
High-precision medical speech recognition through synthetic data and semantic correction: UNITED-MEDASR		0.985	United Med ASR	2024-11-24
Samba-ASR: State-Of-The-Art Speech Recognition Leveraging Structured State-Space Models		1.17	SAMBA ASR	2025-01-06
Step-Audio 2 Technical Report		1.17	Step-Audio 2	2025-08-27
Kimi-Audio Technical Report	✓ Link	1.28	Kimi-Audio	2025-04-25
Step-Audio 2 Technical Report	✓ Link	1.33	Step-Audio 2 mini	2025-08-27
FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information	✓ Link	1.34	FAdam	2024-05-21
Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition	✓ Link	1.4	Conformer + Wav2vec 2.0 + SpecAugment-based Noisy Student Training with Libri-Light	2020-10-20
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training	✓ Link	1.4	w2v-BERT XXL	2021-08-07
Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition		1.46	parakeet-rnnt-1.1b	2023-05-08
Self-training and Pre-training are Complementary for Speech Recognition	✓ Link	1.5	Conv + Transformer + wav2vec2.0 + pseudo labeling	2020-10-22
Improved Noisy Student Training for Automatic Speech Recognition	✓ Link	1.7	ContextNet + SpecAugment-based Noisy Student Training with Libri-Light	2020-05-19
SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network		1.7	SpeechStew (1B)	2021-04-05
ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition		1.75	Multistream CNN with Self-Attentive SRU (WER includes text normalization)	2020-05-21
Step-Audio 2 Technical Report		1.75	GPT-4o Transcribe	2025-08-27
Multi-Head State Space Model for Speech Recognition		1.76	Stateformer	2023-05-21
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations	✓ Link	1.8	wav2vec 2.0 with Libri-Light	2020-06-20
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units	✓ Link	1.8	HuBERT with Libri-Light	2021-06-14
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing	✓ Link	1.8	WavLM Large	2021-10-26
E-Branchformer: Branchformer with Enhanced merging for speech recognition	✓ Link	1.81	E-Branchformer (L) + Internal Language Model Estimation	2022-09-30
CR-CTC: Consistency regularization on CTC for improved speech recognition	✓ Link	1.88	Zipformer+pruned transducer w/ CR-CTC (no external language model)	2024-10-07
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context	✓ Link	1.9	ContextNet(L)	2020-05-07
Conformer: Convolution-augmented Transformer for Speech Recognition	✓ Link	1.9	Conformer(L)	2020-05-16
Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning with Self-Knowledge Distillation		1.9	Transformer+Time reduction+Self Knowledge distillation	2021-03-17
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context	✓ Link	2	ContextNet(M)	2020-05-07
Improving RNN Transducer Based ASR with Auxiliary Tasks	✓ Link	2.0	Transformer Transducer	2020-11-05
Conformer: Convolution-augmented Transformer for Speech Recognition	✓ Link	2	Conformer(M)	2020-05-16
SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network		2.0	SpeechStew (100M)	2021-04-05
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models	✓ Link	2.0	Qwen-Audio	2023-11-14
Zipformer: A faster and better encoder for automatic speech recognition	✓ Link	2.00	Zipformer+pruned transducer (no external language model)	2023-10-17
CR-CTC: Consistency regularization on CTC for improved speech recognition	✓ Link	2.02	Zipformer+CR-CTC (no external language model)	2024-10-07
End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures	✓ Link	2.03	Conv + Transformer AM + Pseudo-Labeling (ConvLM with Transformer Rescoring)	2019-11-19
Iterative Pseudo-Labeling for Speech Recognition	✓ Link	2.10	Conv + Transformer AM + Iterative Pseudo-Labeling (n-gram LM + Transformer Rescoring)	2020-05-19
Faster, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces		2.10	CTC + Transformer LM rescoring	2020-05-19
Conformer: Convolution-augmented Transformer for Speech Recognition	✓ Link	2.1	Conformer(S)	2020-05-16
Graph Convolutions Enrich the Self-Attention in Transformers!	✓ Link	2.11	Branchformer + GFSA	2023-12-07
State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention With Dilated 1D Convolutions	✓ Link	2.20	Multi-Stream Self-Attention With Dilated 1D Convolutions	2019-10-01
Librispeech Transducer Model with Internal Language Model Prior Correction	✓ Link	2.23	LSTM Transducer	2021-04-07
Transformer-based Acoustic Modeling for Hybrid Speech Recognition		2.26	Hybrid + Transformer LM rescoring	2019-10-22
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data Augmentation	✓ Link	2.3	Hybrid model with Transformer rescoring	2019-05-08
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context	✓ Link	2.3	ContextNet(S)	2020-05-07
End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures	✓ Link	2.31	Conv + Transformer AM (ConvLM with Transformer Rescoring) (LS only)	2019-11-19
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition	✓ Link	2.47	Squeezeformer (L)	2022-06-02
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition	✓ Link	2.5	LAS + SpecAugment	2019-04-18
A Comparative Study on Transformer vs RNN in Speech Applications	✓ Link	2.6	Transformer	2019-09-13
QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions	✓ Link	2.69	QuartzNet15x5	2019-10-22
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition	✓ Link	2.7	LAS (no LM)	2019-04-18
Self-training and Pre-training are Complementary for Speech Recognition	✓ Link	2.7	wav2vec_wav2letter	2020-10-22
Espresso: A Fast End-to-end Neural Speech Recognition Toolkit	✓ Link	2.8	Espresso	2019-09-18
Jasper: An End-to-End Convolutional Neural Acoustic Model	✓ Link	2.84	Jasper DR 10x5 (+ Time/Freq Masks)	2019-04-05
Step-Audio 2 Technical Report		2.92	Doubao LLM ASR	2025-08-27
Step-Audio 2 Technical Report		2.93	Qwen Omni	2025-08-27
Jasper: An End-to-End Convolutional Neural Acoustic Model	✓ Link	2.95	Jasper DR 10x5	2019-04-05
Neural Network Language Modeling with Letter-based Features and Importance Sampling		3.06	tdnn + chain + rnnlm rescoring	2018-04-15
Fully Convolutional Speech Recognition		3.26	Convolutional Speech Recognition	2018-12-17
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets	✓ Link	3.4	MT4SSL	2022-11-14
On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition	✓ Link	3.60	Model Unit Exploration	2019-02-05
Improved training of end-to-end attention models for speech recognition	✓ Link	3.82	Seq-to-seq attention	2018-05-08
CRF-based Single-stage Acoustic Modeling with CTC Topology	✓ Link	4.09	CTC-CRF 4gram-LM	2019-04-16
[]()		4.3	HMM-TDNN trained with MMI + data augmentation (speed) + iVectors + 3 regularizations
Let SSMs be ConvNets: State-space Modeling with Optimal Tensor Contractions		4.4	Centaurus (30 M)	2025-01-22
[]()		4.8	HMM-TDNN + iVectors
Letter-Based Speech Recognition with Gated ConvNets	✓ Link	4.8	Gated ConvNets	2017-12-22
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin	✓ Link	5.33	Deep Speech 2	2015-12-08
Improving End-to-End Speech Recognition with Policy Learning		5.42	CTC + policy learning	2017-12-19
[]()		5.5	HMM-DNN + pNorm*
The PyTorch-Kaldi Speech Recognition Toolkit	✓ Link	6.2	Li-GRU	2018-11-19
Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces	✓ Link	6.4	Snips	2018-05-25
Semi-Supervised Speech Recognition via Local Prior Matching	✓ Link	7.19	Local Prior Matching (Large Model)	2020-02-24
[]()		8.0	HMM-(SAT)GMM
Amortized Neural Networks for Low-Latency Speech Recognition		8.6	AmNet	2021-08-03

OpenCodePapers

speech-recognition-on-librispeech-test-clean