lipreading-on-lrs3-ted

Natural Language TransductionLipreading

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Word Error Rate (WER)	ModelName	ReleaseDate
Conformers are All You Need for Visual Speech Recognition		12.8	LP + Conformer	2023-02-17
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels	✓ Link	19.1	Auto-AVSR	2023-03-25
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization	✓ Link	21.5	SyncVSR	2024-06-18
Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs	✓ Link	21.5	USR (self + semi-supervised)	2024-11-04
Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs	✓ Link	22.3	USR (self-supervised)	2024-11-04
Jointly Learning Visual and Auditory Speech Representations from Raw Data	✓ Link	23.4	RAVEn Large	2022-12-12
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing	✓ Link	25.4	VSP-LLM	2024-02-23
Relaxed Attention for Transformer Models	✓ Link	25.51	AV-HuBERT Large + Relaxed Attention + LM	2022-09-20
Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models	✓ Link	26.2	DistillAV	2025-02-09
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction	✓ Link	26.9	AV-HuBERT Large	2022-01-05
Sub-word Level Lip Reading With Visual Attention		30.7	VTP (more data)	2021-10-14
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization	✓ Link	31.2	SyncVSR	2024-06-18
Visual Speech Recognition for Multiple Languages in the Wild	✓ Link	31.5	CTC/Attention (LRW+LRS2/3+AVSpeech)	2022-02-26
Recurrent Neural Network Transducer for Audio-Visual Speech Recognition	✓ Link	33.6	RNN-T	2019-11-08
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations		37.1	ES³ Large	2024-01-01
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations		40.3	ES³ Base	2024-01-01
Sub-word Level Lip Reading With Visual Attention		40.6	VTP	2021-10-14
End-to-end Audio-visual Speech Recognition with Conformers	✓ Link	43.3	Hyb + Conformer	2021-02-12
Large-Scale Visual Speech Recognition		55.1	CTC-V2P	2018-07-13
Discriminative Multi-modality Speech Recognition	✓ Link	57.8	EG-seq2seq	2020-05-12
Deep Audio-Visual Speech Recognition	✓ Link	58.9	TM-seq2seq	2018-09-06
ASR is all you need: cross-modal distillation for lip reading		59.8	CTC + KD	2019-11-28
Spatio-Temporal Fusion Based Convolutional Sequence Learning for Lip Reading		60.1	Conv-seq2seq	2019-10-01

OpenCodePapers

lipreading-on-lrs3-ted