OpenCodePapers
zero-shot-video-question-answer-on-vnbench
Video Question Answering
Zero-Shot Video Question Answer
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
Accuracy
↕
ModelName
ReleaseDate
↕
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
✓ Link
77.88
BIMBA-LLaVA-Qwen2-7B
2025-03-12
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
✓ Link
66.7
Gemini
2024-03-08
LLaVA-OneVision: Easy Visual Task Transfer
✓ Link
58.7
LLaVA-OneVision-72B
2024-08-06
LLaVA-OneVision: Easy Visual Task Transfer
✓ Link
51.8
LLaVA-OneVision-7B
2024-08-06
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
✓ Link
33.9
Qwen2-VL-7B
2024-09-18
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
✓ Link
20.1
LLaVA-NeXT-Video-7B
2024-07-10
VideoChat: Chat-Centric Video Understanding
✓ Link
12.4
VideoChat2
2023-05-10
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
✓ Link
4.5
VideoLLaMA2
2024-06-11
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
✓ Link
4.1
VideoChatGPT
2023-06-08