OpenCodePapers

zeroshot-video-question-answer-on-tgif-qa

Video Question AnsweringZero-Shot Video Question Answer

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Accuracy	Confidence Score	ModelName	ReleaseDate
Tarsier: Recipes for Training and Evaluating Large Video Description Models	✓ Link	82.5	4.4	Tarsier (34B)	2024-06-30
LinVT: Empower Your Image-level Large Language Model to Understand Videos	✓ Link	81.3	4.3	LinVT-Qwen2-VL (7B)	2024-12-06
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models	✓ Link	81.0	4.2	TS-LLaVA-34B	2024-11-17
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning	✓ Link	80.6	4.3	PLLaVA	2024-04-25
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models	✓ Link	80.6	4.3	SlowFast-LLaVA-34B	2024-07-22
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM	✓ Link	79.1	4.2	IG-VLM	2024-03-27
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding	✓ Link	74.6	4.1	VideoGPT+	2024-06-13
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens	✓ Link	72.22		MiniGPT4-video-7B	2024-04-04
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection	✓ Link	70.0	4.0	Video-LLaVA-7B	2023-11-16
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding	✓ Link	69.0	3.8	Chat-UniVi-7B	2023-11-14
Elysium: Exploring Object-level Perception in Videos via MLLM	✓ Link	66.6	3.6	Elysium	2024-03-25
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models	✓ Link	51.4	3.0	Video-ChatGPT-7B	2023-06-08
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models	✓ Link	41.9		FrozenBiLM	2022-06-16
VideoChat: Chat-Centric Video Understanding	✓ Link	34.4	2.3	Video Chat-7B	2023-05-10