OpenCodePapers

speech-recognition-on-librispeech-test-other

Speech Recognition
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeWord Error Rate (WER)ModelNameReleaseDate
Kimi-Audio Technical Report✓ Link2.42Kimi-Audio2025-04-25
Step-Audio 2 Technical Report2.42Step-Audio 22025-08-27
Samba-ASR: State-Of-The-Art Speech Recognition Leveraging Structured State-Space Models2.48SAMBA ASR2025-01-06
FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information✓ Link2.49FAdam2024-05-21
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training✓ Link2.5w2v-BERT XXL2021-08-07
Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition✓ Link2.6Conformer + Wav2vec 2.0 + SpecAugment-based Noisy Student Training with Libri-Light2020-10-20
Step-Audio 2 Technical Report✓ Link2.86Step-Audio 2 mini2025-08-27
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units✓ Link2.9HuBERT with Libri-Light2021-06-14
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations✓ Link3.0wav2vec 2.0 with Libri-Light2020-06-20
Self-training and Pre-training are Complementary for Speech Recognition✓ Link3.1Conv + Transformer + wav2vec2.0 + pseudo labeling2020-10-22
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing✓ Link3.2WavLM Large2021-10-26
SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network3.3SpeechStew (1B)2021-04-05
Improved Noisy Student Training for Automatic Speech Recognition✓ Link3.4ContextNet + SpecAugment-based Noisy Student Training with Libri-Light2020-05-19
E-Branchformer: Branchformer with Enhanced merging for speech recognition✓ Link3.65E-Branchformer (L) + Internal Language Model Estimation2022-09-30
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language✓ Link3.7data2vec2022-02-07
Iterative Pseudo-Labeling for Speech Recognition✓ Link3.83Conv + Transformer AM + Iterative Pseudo-Labeling (n-gram LM + Transformer Rescoring)2020-05-19
Conformer: Convolution-augmented Transformer for Speech Recognition✓ Link3.9Conformer(L)2020-05-16
CR-CTC: Consistency regularization on CTC for improved speech recognition✓ Link3.95Zipformer+pruned transducer w/ CR-CTC (no external language model)2024-10-07
SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network4.0SpeechStew (100M)2021-04-05
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations✓ Link4.1wav2vec 2.02020-06-20
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context✓ Link4.1ContextNet(L)2020-05-07
End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures✓ Link4.11Conv + Transformer AM (ConvLM with Transformer Rescoring)2019-11-19
Faster, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces4.20CTC + Transformer LM rescoring2020-05-19
Improving RNN Transducer Based ASR with Auxiliary Tasks✓ Link4.20Transformer Transducer2020-11-05
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models✓ Link4.2Qwen-Audio2023-11-14
Step-Audio 2 Technical Report4.23GPT-4o Transcribe2025-08-27
Conformer: Convolution-augmented Transformer for Speech Recognition✓ Link4.3Conformer(M)2020-05-16
CR-CTC: Consistency regularization on CTC for improved speech recognition✓ Link4.35Zipformer+CR-CTC (no external language model)2024-10-07
Zipformer: A faster and better encoder for automatic speech recognition✓ Link4.38Zipformer+pruned transducer (no external language model)2023-10-17
ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition4.46Multistream CNN with Self-Attentive SRU2020-05-21
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context✓ Link4.5ContextNet(M)2020-05-07
Transformer-based Acoustic Modeling for Hybrid Speech Recognition4.85hybrid + Transformer LM rescoring2019-10-22
Graph Convolutions Enrich the Self-Attention in Transformers!✓ Link4.94Branchformer + GFSA2023-12-07
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data Augmentation✓ Link5.0Hybrid model with Transformer rescoring2019-05-08
Conformer: Convolution-augmented Transformer for Speech Recognition✓ Link5.0Conformer(S)2020-05-16
Step-Audio 2 Technical Report5.07Qwen Omni2025-08-27
End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures✓ Link5.18Conv + Transformer AM (ConvLM with Transformer Rescoring) (LS only)2019-11-19
Step-Audio 2 Technical Report5.32Doubao LLM ASR2025-08-27
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context✓ Link5.5ContextNet(S)2020-05-07
Librispeech Transducer Model with Internal Language Model Prior Correction✓ Link5.6LSTM Transducer2021-04-07
A Comparative Study on Transformer vs RNN in Speech Applications✓ Link5.7Transformer2019-09-13
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition✓ Link5.8LAS + SpecAugment2019-04-18
State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention With Dilated 1D Convolutions✓ Link5.80Multi-Stream Self-Attention With Dilated 1D Convolutions2019-10-01
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition✓ Link5.97Squeezeformer (L)2022-06-02
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition✓ Link6.5LAS (no LM)2019-04-18
Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition✓ Link6.85Conformer with Relaxed Attention2021-07-02
QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions✓ Link7.25QuartzNet15x52019-10-22
Neural Network Language Modeling with Letter-based Features and Importance Sampling7.63tdnn + chain + rnnlm rescoring2018-04-15
Jasper: An End-to-End Convolutional Neural Acoustic Model✓ Link7.84Jasper DR 10x5 (+ Time/Freq Masks)2019-04-05
Espresso: A Fast End-to-end Neural Speech Recognition Toolkit✓ Link8.7Espresso2019-09-18
Jasper: An End-to-End Convolutional Neural Acoustic Model✓ Link8.79Jasper DR 10x52019-04-05
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets✓ Link9.6MT4SSL2022-11-14
Fully Convolutional Speech Recognition10.47Convolutional Speech Recognition2018-12-17
CRF-based Single-stage Acoustic Modeling with CTC Topology✓ Link10.65CTC-CRF 4gram-LM2019-04-16
[]()12.5TDNN + pNorm + speed up/down speech
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin✓ Link13.25Deep Speech 22015-12-08
Semi-Supervised Speech Recognition via Local Prior Matching✓ Link15.28Local Prior Matching (Large Model, ConvLM LM)2020-02-24
Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces✓ Link16.5Snips2018-05-25
Semi-Supervised Speech Recognition via Local Prior Matching✓ Link20.84Local Prior Matching (Large Model)2020-02-24