OpenCodePapers
audio-visual-speech-recognition-on-lrs3-ted
Audio-Visual Speech Recognition
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
Word Error Rate (WER)
↕
ModelName
ReleaseDate
↕
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
✓ Link
0.74
MMS-LLaMA
2025-03-14
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
✓ Link
0.76
Whisper-Flamingo
2024-06-14
Large Language Models are Strong Audio-Visual Speech Recognition Learners
✓ Link
0.77
Llama-AVSR
2024-09-18
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
✓ Link
0.9
CTC/Attention
2023-03-25
Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models
✓ Link
1.3
DistillAV
2025-02-09
Robust Self-Supervised Audio-Visual Speech Recognition
✓ Link
1.4
AV-HuBERT Large
2022-01-05
Jointly Learning Visual and Auditory Speech Representations from Raw Data
✓ Link
1.4
RAVEn Large
2022-12-12
Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations
✓ Link
1.5
Zero-AVSR
2025-03-08
End-to-end Audio-visual Speech Recognition with Conformers
✓ Link
2.3
Hyb-Conformer
2021-02-12
Recurrent Neural Network Transducer for Audio-Visual Speech Recognition
✓ Link
4.5
RNN-T
2019-11-08
Discriminative Multi-modality Speech Recognition
✓ Link
6.8
EG-seq2seq
2020-05-12
Deep Audio-Visual Speech Recognition
✓ Link
7.2
TM-seq2seq
2018-09-06