OpenCodePapers

zero-shot-video-question-answer-on-vnbench

Video Question AnsweringZero-Shot Video Question Answer

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Accuracy	ModelName	ReleaseDate
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering	✓ Link	77.88	BIMBA-LLaVA-Qwen2-7B	2025-03-12
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context	✓ Link	66.7	Gemini	2024-03-08
LLaVA-OneVision: Easy Visual Task Transfer	✓ Link	58.7	LLaVA-OneVision-72B	2024-08-06
LLaVA-OneVision: Easy Visual Task Transfer	✓ Link	51.8	LLaVA-OneVision-7B	2024-08-06
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution	✓ Link	33.9	Qwen2-VL-7B	2024-09-18
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models	✓ Link	20.1	LLaVA-NeXT-Video-7B	2024-07-10
VideoChat: Chat-Centric Video Understanding	✓ Link	12.4	VideoChat2	2023-05-10
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs	✓ Link	4.5	VideoLLaMA2	2024-06-11
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models	✓ Link	4.1	VideoChatGPT	2023-06-08