OpenCodePapers

zero-shot-video-question-answer-on-vnbench

Video Question AnsweringZero-Shot Video Question Answer
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyModelNameReleaseDate
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering✓ Link77.88BIMBA-LLaVA-Qwen2-7B2025-03-12
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context✓ Link66.7Gemini2024-03-08
LLaVA-OneVision: Easy Visual Task Transfer✓ Link58.7LLaVA-OneVision-72B2024-08-06
LLaVA-OneVision: Easy Visual Task Transfer✓ Link51.8LLaVA-OneVision-7B2024-08-06
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution✓ Link33.9Qwen2-VL-7B2024-09-18
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models✓ Link20.1LLaVA-NeXT-Video-7B2024-07-10
VideoChat: Chat-Centric Video Understanding✓ Link12.4VideoChat22023-05-10
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs✓ Link4.5VideoLLaMA22024-06-11
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models✓ Link4.1VideoChatGPT2023-06-08