OpenCodePapers

audio-visual-speech-recognition-on-lrs3-ted

Audio-Visual Speech Recognition
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeWord Error Rate (WER)ModelNameReleaseDate
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens✓ Link0.74MMS-LLaMA2025-03-14
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation✓ Link0.76Whisper-Flamingo2024-06-14
Large Language Models are Strong Audio-Visual Speech Recognition Learners✓ Link0.77Llama-AVSR2024-09-18
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels✓ Link0.9CTC/Attention2023-03-25
Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models✓ Link1.3DistillAV2025-02-09
Robust Self-Supervised Audio-Visual Speech Recognition✓ Link1.4AV-HuBERT Large2022-01-05
Jointly Learning Visual and Auditory Speech Representations from Raw Data✓ Link1.4RAVEn Large2022-12-12
Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations✓ Link1.5Zero-AVSR2025-03-08
End-to-end Audio-visual Speech Recognition with Conformers✓ Link2.3Hyb-Conformer2021-02-12
Recurrent Neural Network Transducer for Audio-Visual Speech Recognition✓ Link4.5RNN-T2019-11-08
Discriminative Multi-modality Speech Recognition✓ Link6.8EG-seq2seq2020-05-12
Deep Audio-Visual Speech Recognition✓ Link7.2TM-seq2seq2018-09-06