OpenCodePapers

video-question-answering-on-agqa-2-0-balanced

Video Question Answering
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAverage AccuracyModelNameReleaseDate
Glance and Focus: Memory Prompting for Multi-Event Video Question Answering✓ Link55.08GF (sup) - Faster RCNN2024-01-03
MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering✓ Link54.39MIST - CLIP2022-12-19
Glance and Focus: Memory Prompting for Multi-Event Video Question Answering✓ Link53.33GF (uns) - S3D2024-01-03
SViTT: Temporal Learning of Sparse Video-Text Transformers✓ Link52.7SViTT2023-04-18
MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering✓ Link50.96MIST - AIO2022-12-19
Learning Situation Hyper-Graphs for Video Question Answering✓ Link49.2SHG-VQA (trained from scratch)2023-04-18
Glance and Focus: Memory Prompting for Multi-Event Video Question Answering✓ Link48.59AIO - ViT2024-01-03
MMTF: Multi-Modal Temporal Fusion for Commonsense Video Question Answering44.36MMTF2023-10-06