OpenCodePapers
zero-shot-video-question-answer-on-egoschema
Video Question Answering
Zero-Shot Video Question Answer
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
Accuracy
↕
Inference Speed (s)
↕
ModelName
ReleaseDate
↕
Tarsier: Recipes for Training and Evaluating Large Video Description Models
✓ Link
68.6
Tarsier (34B)
2024-06-30
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
✓ Link
68.4
VideoChat-T (7B)
2024-10-25
Language Repository for Long Video Understanding
✓ Link
66.2
LangRepo (12B)
2024-03-21
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
✓ Link
66.2
VideoTree (GPT4)
2024-05-29
Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA
✓ Link
66.0
LVNet
2024-06-13
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
✓ Link
65.6
VideoChat2_HD_mistral
2023-11-28
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
✓ Link
63.6
VideoChat2_mistral
2023-11-28
Understanding Long Videos with Multimodal Language Models
✓ Link
60.3
2.42
MVU (13B)
2024-03-25
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models
✓ Link
57.8
TS-LLaVA-34B
2024-11-17
A Simple LLM Framework for Long-Range Video Question-Answering
✓ Link
57.6
LLoVi (GPT-3.5)
2023-12-28
A Simple LLM Framework for Long-Range Video Question-Answering
✓ Link
50.8
LLoVi (7B)
2023-12-28
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
✓ Link
47.2
SlowFast-LLaVA-34B
2024-07-22
Self-Chained Image-Language Model for Video Localization and Question Answering
✓ Link
25.7
SeViLA (4B)
2023-05-11
[]()
20.0
Random