OpenCodePapers
zero-shot-video-question-answer-on-tvqa
Video Question Answering
Zero-Shot Video Question Answer
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
Accuracy
↕
ModelName
ReleaseDate
↕
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
✓ Link
59.7
FrozenBiLM (with speech)
2022-06-16
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM
✓ Link
57.8
IG-VLM (no speech, GPT-4V)
2024-03-27
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens
✓ Link
54.21
MiniGPT4-video-7B
2024-04-04
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
✓ Link
50.6
VideoChat_HD_mistral (no speech)
2023-11-28
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
✓ Link
46.4
VideoChat_mistral (no speech)
2023-11-28
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
✓ Link
40.6
VideoChat2 (no speech)
2023-11-28
Self-Chained Image-Language Model for Video Localization and Question Answering
✓ Link
38.2
SEVILA (no speech)
2023-05-11
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
✓ Link
35.9
InternVideo (no speech)
2022-12-06
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
✓ Link
29.7
FrozenBILM (no speech)
2022-06-16