lipreading-on-lrs2

Natural Language TransductionLipreading

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Word Error Rate (WER)	ModelName	ReleaseDate
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels	✓ Link	14.6	Auto-AVSR	2023-03-25
Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs	✓ Link	15.4	USR	2024-11-04
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization	✓ Link	16.5	SyncVSR	2024-06-18
Jointly Learning Visual and Auditory Speech Representations from Raw Data	✓ Link	18.6	RAVEn Large	2022-12-12
Sub-word Level Lip Reading With Visual Attention		22.6	VTP (more data)	2021-10-14
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations		24.6	ES³ Large + extLM	2024-01-01
Visual Speech Recognition for Multiple Languages in the Wild	✓ Link	25.5	CTC/Attention (LRW+LRS2/3+AVSpeech)	2022-02-26
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations		26.7	ES³ Large	2024-01-01
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations		28.7	ES³ Base + extLM	2024-01-01
Sub-word Level Lip Reading With Visual Attention		28.9	VTP	2021-10-14
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization	✓ Link	28.9	SyncVSR	2024-06-18
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations		29.3	ES³ Base* + extLM	2024-01-01
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations		30.7	ES³ Base	2024-01-01
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations		31.4	ES³ Base*	2024-01-01
Visual Speech Recognition for Multiple Languages in the Wild	✓ Link	32.9	CTC/Attention	2022-02-26
End-to-end Audio-visual Speech Recognition with Conformers	✓ Link	39.1	Hybrid CTC / Attention	2021-02-12
Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition	✓ Link	43.2	MoCo + wav2vec (w/o extLM)	2022-02-24
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading	✓ Link	44.5	Multi-head Visual-Audio Memory	2022-04-04
Deep Audio-Visual Speech Recognition	✓ Link	48.3	TM-seq2seq + extLM	2018-09-06
Audio-visual Recognition of Overlapped speech for the LRS2 dataset		48.86	LF-MMI TDNN	2020-01-06
Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture		50	Hybrid CTC / Attention	2018-09-28
Spatio-Temporal Fusion Based Convolutional Sequence Learning for Lip Reading		51.7	Conv-seq2seq	2019-10-01
ASR is all you need: cross-modal distillation for lip reading		53.2	CTC + KD ASR	2019-11-28
Deep Audio-Visual Speech Recognition	✓ Link	54.7	TM-CTC + extLM	2018-09-06
Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers	✓ Link	65.29	LIBS	2019-11-26

OpenCodePapers

lipreading-on-lrs2