OpenCodePapers

lipreading-on-lrs2

Natural Language TransductionLipreading
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeWord Error Rate (WER)ModelNameReleaseDate
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels✓ Link14.6Auto-AVSR2023-03-25
Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs✓ Link15.4USR2024-11-04
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization✓ Link16.5SyncVSR2024-06-18
Jointly Learning Visual and Auditory Speech Representations from Raw Data✓ Link18.6RAVEn Large2022-12-12
Sub-word Level Lip Reading With Visual Attention22.6VTP (more data)2021-10-14
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations24.6ES³ Large + extLM2024-01-01
Visual Speech Recognition for Multiple Languages in the Wild✓ Link25.5CTC/Attention (LRW+LRS2/3+AVSpeech)2022-02-26
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations26.7ES³ Large2024-01-01
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations28.7ES³ Base + extLM2024-01-01
Sub-word Level Lip Reading With Visual Attention28.9VTP2021-10-14
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization✓ Link28.9SyncVSR2024-06-18
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations29.3ES³ Base* + extLM2024-01-01
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations30.7ES³ Base2024-01-01
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations31.4ES³ Base*2024-01-01
Visual Speech Recognition for Multiple Languages in the Wild✓ Link32.9CTC/Attention2022-02-26
End-to-end Audio-visual Speech Recognition with Conformers✓ Link39.1Hybrid CTC / Attention2021-02-12
Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition✓ Link43.2MoCo + wav2vec (w/o extLM)2022-02-24
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading✓ Link44.5Multi-head Visual-Audio Memory2022-04-04
Deep Audio-Visual Speech Recognition✓ Link48.3TM-seq2seq + extLM2018-09-06
Audio-visual Recognition of Overlapped speech for the LRS2 dataset48.86LF-MMI TDNN2020-01-06
Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture50Hybrid CTC / Attention2018-09-28
Spatio-Temporal Fusion Based Convolutional Sequence Learning for Lip Reading51.7Conv-seq2seq2019-10-01
ASR is all you need: cross-modal distillation for lip reading53.2CTC + KD ASR2019-11-28
Deep Audio-Visual Speech Recognition✓ Link54.7TM-CTC + extLM2018-09-06
Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers✓ Link65.29LIBS2019-11-26