OpenCodePapers

audio-visual-speech-recognition-on-lrs2

Audio-Visual Speech Recognition
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeTest WERModelNameReleaseDate
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation✓ Link1.4Whisper-Flamingo2024-06-14
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels✓ Link1.5CTC/Attention2023-03-25
Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition✓ Link2.6MoCo + wav2vec (w/o extLM)2022-02-24
End-to-end Audio-visual Speech Recognition with Conformers✓ Link3.7End2end Conformer2021-02-12
Audio-visual Recognition of Overlapped speech for the LRS2 dataset5.9LF-MMI TDNN2020-01-06
Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture7.0CTC/Attention2018-09-28
Deep Audio-Visual Speech Recognition✓ Link8.2TM-CTC2018-09-06
Deep Audio-Visual Speech Recognition✓ Link8.5TM-Seq2seq2018-09-06