| wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations | ✓ Link | 8.3 | wav2vec 2.0 | 2020-06-20 |
| vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations | ✓ Link | 11.6 | vq-wav2vec | 2019-10-12 |
| The PyTorch-Kaldi Speech Recognition Toolkit | ✓ Link | 14.2 | LiGRU + Dropout + BatchNorm + Monophone Reg | 2018-11-19 |
| The PyTorch-Kaldi Speech Recognition Toolkit | ✓ Link | 14.5 | LSTM + Dropout + BatchNorm + Monophone Reg | 2018-11-19 |
| wav2vec: Unsupervised Pre-training for Speech Recognition | ✓ Link | 14.7 | wav2vec | 2019-04-11 |
| The PyTorch-Kaldi Speech Recognition Toolkit | ✓ Link | 14.9 | GRU + Dropout + BatchNorm + Monophone Reg | 2018-11-19 |
| Light Gated Recurrent Units for Speech Recognition | ✓ Link | 14.9 | Li-GRU + fMLLR features | 2018-03-26 |
| The PyTorch-Kaldi Speech Recognition Toolkit | ✓ Link | 15.9 | RNN + Dropout + BatchNorm + Monophone Reg | 2018-11-19 |
| The PyTorch-Kaldi Speech Recognition Toolkit | ✓ Link | 16.0 | LSTM | 2018-11-19 |
| The PyTorch-Kaldi Speech Recognition Toolkit | ✓ Link | 16.3 | Li-GRU | 2018-11-19 |
| []() | | 16.5 | Hierarchical maxout CNN + Dropout | |
| The PyTorch-Kaldi Speech Recognition Toolkit | ✓ Link | 16.5 | RNN | 2018-11-19 |
| The PyTorch-Kaldi Speech Recognition Toolkit | ✓ Link | 16.6 | GRU | 2018-11-19 |
| []() | | 16.7 | CNN in time and frequency + dropout, 17.6% w/o dropout | |
| Light Gated Recurrent Units for Speech Recognition | ✓ Link | 16.7 | Light Gated Recurrent Units | 2018-03-26 |
| Segmental Recurrent Neural Networks for End-to-end Speech Recognition | | 17.3 | RNN-CRF on 24(x3) MFSC | 2016-03-01 |
| Attention-Based Models for Speech Recognition | ✓ Link | 17.6 | Bi-RNN + Attention | 2015-06-24 |
| Speech Recognition with Deep Recurrent Neural Networks | ✓ Link | 17.7 | Bi-LSTM + skip connections w/ CTC | 2013-03-22 |
| Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition | ✓ Link | 19.64 | QCNN-10L-256FM | 2018-06-20 |
| Online and Linear-Time Attention by Enforcing Monotonic Alignments | ✓ Link | 20.1 | Soft Monotonic Attention (ours, offline) | 2017-04-03 |
| Attention model for articulatory features detection | ✓ Link | 20.4 | LAS multitask with indicators sampling | 2019-07-02 |
| Long short-term memory and learning-to-learn in networks of spiking neurons | ✓ Link | 33.2 | LSNN | 2018-03-26 |