OpenCodePapers
video-question-answering-on-ovbench
Video Question Answering
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
AVG
↕
ModelName
ReleaseDate
↕
Seed1.5-VL Technical Report
60.0
Seed1.5-VL
2025-05-11
Online Video Understanding: OVBench and VideoChat-Online
✓ Link
54.9
VideoChat-Online (4B)
2024-12-31
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
✓ Link
50.7
Gemini-1.5-Flash
2024-03-08
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
✓ Link
49.7
Qwen2-VL (7B)
2024-09-18
LLaVA-OneVision: Easy Visual Task Transfer
✓ Link
49.5
LLaVA-OneVision (7B)
2024-08-06
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
✓ Link
48.7
InternVL2 (7B)
2024-12-06
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
✓ Link
44.1
InternVL2 (4B)
2024-12-06
Long Context Transfer from Language to Vision
✓ Link
43.6
LongVA (7B)
2024-06-24
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
✓ Link
41.9
LLaMA-VID (7B)
2023-11-28
[]()
39.1
MiniCPM-V 2.6 (7B)
VTimeLLM: Empower LLM to Grasp Video Moments
✓ Link
33.1
VTimeLLM (7B)
2023-11-30
Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams
✓ Link
31.2
Flash-Vstream (7B)
2024-06-12
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
✓ Link
30.9
MovieChat (7B)
2023-07-31
LITA: Language Instructed Temporal-Localization Assistant
✓ Link
20.4
LITA (7B)
2024-03-27
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
✓ Link
12.8
TimeChat (7B)
2023-12-04
VideoLLM-online: Online Video Large Language Model for Streaming Video
9.6
VideoLLM-Online (7B)
2024-06-17