OpenCodePapers

video-question-answering-on-situated

Video Question Answering
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAverage AccuracyModelNameReleaseDate
ViLA: Efficient Video-Language Alignment for Video Question Answering✓ Link67.1VLAP (4 frames)2023-12-13
Large Language Models are Temporal and Causal Reasoners for Video Question Answering✓ Link65.4LLaMA-VQA2023-10-24
Self-Chained Image-Language Model for Video Localization and Question Answering✓ Link64.9SeViLA2023-05-11
InternVideo: General Video Foundation Models via Generative and Discriminative Learning✓ Link58.7InternVideo2022-12-06
Glance and Focus: Memory Prompting for Multi-Event Video Question Answering✓ Link53.94GF(sup)2024-01-03
Glance and Focus: Memory Prompting for Multi-Event Video Question Answering✓ Link53.86GF(uns)2024-01-03
MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering✓ Link51.13MIST2022-12-19
Revisiting the "Video" in Video-Language Understanding✓ Link48.37Temp[ATP]2022-06-03
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model✓ Link48.2AnyMAL-70B (0-shot)2023-09-27
All in One: Exploring Unified Video-Language Pre-training✓ Link47.5All-in-one2022-03-14
TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering✓ Link44.9TraveLER (0-shot)2024-04-01
Self-Chained Image-Language Model for Video Localization and Question Answering✓ Link44.6SeViLA (0-shot)2023-05-11
Flamingo: a Visual Language Model for Few-Shot Learning✓ Link42.8Flamingo-9B (4-shot)2022-04-29
Flamingo: a Visual Language Model for Few-Shot Learning✓ Link42.4Flamingo-80B (4-shot)2022-04-29
Flamingo: a Visual Language Model for Few-Shot Learning✓ Link41.8Flamingo-9B (0-shot)2022-04-29
Flamingo: a Visual Language Model for Few-Shot Learning✓ Link39.7Flamingo-80B (0-shot)2022-04-29
Learning Situation Hyper-Graphs for Video Question Answering✓ Link39.47SHG-VQA (trained from scratch)2023-04-18