OpenCodePapers

lipreading-on-lip-reading-in-the-wild

Natural Language TransductionLipreading
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeTop-1 AccuracyModelNameReleaseDate
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization✓ Link95.0SyncVSR (Word Boundary)2024-06-18
Training Strategies for Improved Lip-reading✓ Link94.13D Conv + ResNet-18 + DC-TCN + KD (Ensemble & Word Boundary)2022-09-03
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization✓ Link93.2SyncVSR2024-06-18
Audio-Visual Speech Recognition based on Regulated Transformer and Spatio-Temporal Fusion Strategy for Driver Assistive Systems✓ Link89.57AVCRFormer2024-05-09
Accurate and Resource-Efficient Lipreading with Efficientnetv2 and Transformers89.523D Conv + EfficientNetV2 + Transformer + TCN2022-05-23
Visual Speech Recognition in a Driver Assistance System88.7Vosk + MediaPipe + LS + MixUp + SA + 3DResNet-18 + BiLSTM + Cosine WR2022-08-29
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading✓ Link88.53D Conv + ResNet-18 + MS-TCN + Multi-Head Visual-Audio Memory2022-04-04
Towards Practical Lipreading with Distilled and Efficient Models✓ Link88.53D Conv + ResNet-18 + MS-TCN + KD (Ensemble)2020-07-13
Learn an Effective Lip Reading Model without Pains✓ Link88.43D-ResNet + Bi-GRU + MixUp + Label Smoothing + Cosine LR (Word Boundary)2020-11-15
Learn an Effective Lip Reading Model without Pains✓ Link85.53D-ResNet + Bi-GRU + MixUp + Label Smoothing + Cosine LR2020-11-15
Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video✓ Link85.43D Conv + ResNet-18 + Bi-GRU + Visual-Audio Memory2022-04-04
Lipreading using Temporal Convolutional Networks✓ Link85.303D Conv + ResNet-18 + MS-TCN2020-01-23
Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition✓ Link85.023D Conv + ResNet-18 + Bi-GRU(Face Cutout)2020-03-06
Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition✓ Link85.0MoCo + Wav2Vec by SJTU LUMIA2022-02-24
Discriminative Multi-modality Speech Recognition✓ Link84.803D Conv + P3D-ResNet50 + TCN2020-05-12
Mutual Information Maximization for Effective Lip Reading✓ Link84.413D Conv + ResNet-18 + Bi-GRU2020-03-13
SpotFast Networks with Memory Augmented Lateral Transformers for Lipreading✓ Link84.4SpotFast + Transformer + Product-Key memory2020-05-21
Deformation Flow Based Two-Stream Network for Lip Reading✓ Link84.13DFTN2020-03-12
Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading83.5PCPG2020-03-09
End-to-end Audiovisual Speech Recognition✓ Link83.393D Conv + ResNet-34 + Bi-GRU2018-02-18
Multi-Grained Spatio-temporal Modeling for Lip-reading83.34Multi-grained + Bi-ConvLSTM2019-08-30
Combining Residual Networks with LSTMs for Lipreading✓ Link83.003D Conv + ResNet-34 + Bi-LSTM2017-03-12