OpenCodePapers

zero-shot-video-question-answer-on-intentqa

Video Question AnsweringZero-Shot Video Question Answer

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Accuracy	ModelName	ReleaseDate
ENTER: Event Based Interpretable Reasoning for VideoQA		71.5	ENTER	2025-01-24
Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA	✓ Link	71.1	LVNet	2024-06-13
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models	✓ Link	67.9	TS-LLaVA-34B	2024-11-17
VidCtx: Context-aware Video Question Answering with Image Models	✓ Link	67.1	VidCtx (7B)	2024-12-23
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos	✓ Link	66.9	VideoTree (GPT4)	2024-05-29
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM	✓ Link	65.3	IG-VLM	2024-03-27
A Simple LLM Framework for Long-Range Video Question-Answering	✓ Link	64.0	LLoVi (GPT-4)	2023-12-28
Self-Chained Image-Language Model for Video Localization and Question Answering	✓ Link	60.9	SeViLA (4B)	2023-05-11
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models	✓ Link	60.1	SlowFast-LLaVA-34B	2024-07-22
Language Repository for Long Video Understanding	✓ Link	59.1	LangRepo (12B)	2024-03-21
A Simple LLM Framework for Long-Range Video Question-Answering	✓ Link	53.6	LLoVi (7B)	2023-12-28
Mistral 7B	✓ Link	50.4	Mistral (7B)	2023-10-10
[]()		20.0	Random