OpenCodePapers

video-question-answering-on-msrvtt-mc

Video Question Answering
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyModelNameReleaseDate
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling✓ Link97.6VIOLETv22022-09-04
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training97.4HiTeA2022-12-30
VindLU: A Recipe for Effective Video-and-Language Pretraining✓ Link95.5VindLU2022-12-09
Clover: Towards A Unified Video-Language Alignment and Fusion Model✓ Link95.2Clover2022-07-16
Revealing Single Frame Bias for Video-and-Language Learning✓ Link93.7Singularity-temporal2022-06-07
Multi-granularity Correspondence Learning from Long-term Noisy Videos✓ Link92.7Norton2024-01-30
Revealing Single Frame Bias for Video-and-Language Learning✓ Link92.1Singularity2022-06-07