OpenCodePapers
zeroshot-video-question-answer-on-tgif-qa
Video Question Answering
Zero-Shot Video Question Answer
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
Accuracy
↕
Confidence Score
↕
ModelName
ReleaseDate
↕
Tarsier: Recipes for Training and Evaluating Large Video Description Models
✓ Link
82.5
4.4
Tarsier (34B)
2024-06-30
LinVT: Empower Your Image-level Large Language Model to Understand Videos
✓ Link
81.3
4.3
LinVT-Qwen2-VL (7B)
2024-12-06
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models
✓ Link
81.0
4.2
TS-LLaVA-34B
2024-11-17
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
✓ Link
80.6
4.3
PLLaVA
2024-04-25
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
✓ Link
80.6
4.3
SlowFast-LLaVA-34B
2024-07-22
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM
✓ Link
79.1
4.2
IG-VLM
2024-03-27
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
✓ Link
74.6
4.1
VideoGPT+
2024-06-13
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens
✓ Link
72.22
MiniGPT4-video-7B
2024-04-04
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
✓ Link
70.0
4.0
Video-LLaVA-7B
2023-11-16
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
✓ Link
69.0
3.8
Chat-UniVi-7B
2023-11-14
Elysium: Exploring Object-level Perception in Videos via MLLM
✓ Link
66.6
3.6
Elysium
2024-03-25
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
✓ Link
51.4
3.0
Video-ChatGPT-7B
2023-06-08
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
✓ Link
41.9
FrozenBiLM
2022-06-16
VideoChat: Chat-Centric Video Understanding
✓ Link
34.4
2.3
Video Chat-7B
2023-05-10