On the limit of English conversational speech recognition | | 6.8 | IBM (LSTM+Conformer encoder-decoder) | 2021-05-03 |
Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard | | 7.8 | IBM (LSTM encoder-decoder) | 2020-01-20 |
English Conversational Telephone Speech Recognition by Humans and Machines | | 10.3 | ResNet + BiLSTMs acoustic model | 2017-03-06 |
The Microsoft 2016 Conversational Speech Recognition System | | 11.9 | VGG/Resnet/LACE/BiLSTM acoustic model trained on SWB+Fisher+CH, N-gram + RNNLM language model trained on Switchboard+Fisher+Gigaword+Broadcast | 2016-09-12 |
The IBM 2016 English Conversational Telephone Speech Recognition System | | 12.2 | RNN + VGG + LSTM acoustic model trained on SWB+Fisher+CH, N-gram + "model M" + NNLM language model | 2016-04-27 |
[]() | | 13 | HMM-BLSTM trained with MMI + data augmentation (speed) + iVectors + 3 regularizations + Fisher | |
[]() | | 13.3 | HMM-TDNN trained with MMI + data augmentation (speed) + iVectors + 3 regularizations + Fisher (10% / 15.1% respectively trained on SWBD only) | |
Deep Speech: Scaling up end-to-end speech recognition | ✓ Link | 16 | CNN + Bi-RNN + CTC (speech to letters), 25.9% WER if trainedonlyon SWB | 2014-12-17 |
[]() | | 17.1 | HMM-TDNN + iVectors | |
[]() | | 18.4 | HMM-DNN +sMBR | |
Building DNN Acoustic Models for Large Vocabulary Speech Recognition | ✓ Link | 19.1 | DNN + Dropout | 2014-06-30 |
[]() | | 19.3 | HMM-TDNN + pNorm + speed up/down speech | |