On the limit of English conversational speech recognition | | 4.3 | IBM (LSTM+Conformer encoder-decoder) | 2021-05-03 |
Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard | | 4.7 | IBM (LSTM encoder-decoder) | 2020-01-20 |
English Conversational Telephone Speech Recognition by Humans and Machines | | 5.5 | ResNet + BiLSTMs acoustic model | 2017-03-06 |
Achieving Human Parity in Conversational Speech Recognition | | 5.8 | Microsoft 2016b | 2016-10-17 |
The Microsoft 2016 Conversational Speech Recognition System | | 6.2 | Microsoft 2016 | 2016-09-12 |
The Microsoft 2016 Conversational Speech Recognition System | | 6.3 | VGG/Resnet/LACE/BiLSTM acoustic model trained on SWB+Fisher+CH, N-gram + RNNLM language model trained on Switchboard+Fisher+Gigaword+Broadcast | 2016-09-12 |
The IBM 2016 English Conversational Telephone Speech Recognition System | | 6.6 | RNN + VGG + LSTM acoustic model trained on SWB+Fisher+CH, N-gram + "model M" + NNLM language model | 2016-04-27 |
Achieving Human Parity in Conversational Speech Recognition | | 6.6 | CNN-LSTM | 2016-10-17 |
The IBM 2016 English Conversational Telephone Speech Recognition System | | 6.9 | IBM 2016 | 2016-04-27 |
The Microsoft 2016 Conversational Speech Recognition System | | 6.9 | RNNLM | 2016-09-12 |
The IBM 2015 English Conversational Telephone Speech Recognition System | | 8.0 | IBM 2015 | 2015-05-21 |
[]() | | 8.5 | HMM-BLSTM trained with MMI + data augmentation (speed) + iVectors + 3 regularizations + Fisher | |
[]() | | 9.2 | HMM-TDNN trained with MMI + data augmentation (speed) + iVectors + 3 regularizations + Fisher (10% / 15.1% respectively trained on SWBD only) | |
[]() | | 10.4 | CNN on MFSC/fbanks + 1 non-conv layer for FMLLR/I-Vectors concatenated in a DNN | |
[]() | | 11 | HMM-TDNN + iVectors | |
[]() | | 11.5 | CNN | |
Very Deep Multilingual Convolutional Neural Networks for LVCSR | | 12.2 | Deep CNN (10 conv, 4 FC layers), multi-scale feature maps | 2015-09-29 |
[]() | | 12.6 | HMM-DNN +sMBR | |
[]() | | 12.6 | DNN sMBR | |
Deep Speech: Scaling up end-to-end speech recognition | ✓ Link | 12.6 | Deep Speech + FSH | 2014-12-17 |
Deep Speech: Scaling up end-to-end speech recognition | ✓ Link | 12.6 | CNN + Bi-RNN + CTC (speech to letters), 25.9% WER if trainedonlyon SWB | 2014-12-17 |
[]() | | 12.9 | DNN MMI | |
[]() | | 12.9 | DNN MPE | |
[]() | | 12.9 | DNN BMMI | |
[]() | | 12.9 | HMM-TDNN + pNorm + speed up/down speech | |
Building DNN Acoustic Models for Large Vocabulary Speech Recognition | ✓ Link | 15 | DNN + Dropout | 2014-06-30 |
Building DNN Acoustic Models for Large Vocabulary Speech Recognition | ✓ Link | 16 | DNN | 2014-06-30 |
[]() | | 16.1 | CD-DNN | |
[]() | | 18.5 | DNN-HMM | |
Deep Speech: Scaling up end-to-end speech recognition | ✓ Link | 20 | Deep Speech | 2014-12-17 |