OpenCodePapers

video-question-answering-on-ovbench

Video Question Answering
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAVGModelNameReleaseDate
Seed1.5-VL Technical Report60.0Seed1.5-VL2025-05-11
Online Video Understanding: OVBench and VideoChat-Online✓ Link54.9VideoChat-Online (4B)2024-12-31
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context✓ Link50.7Gemini-1.5-Flash2024-03-08
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution✓ Link49.7Qwen2-VL (7B)2024-09-18
LLaVA-OneVision: Easy Visual Task Transfer✓ Link49.5LLaVA-OneVision (7B)2024-08-06
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling✓ Link48.7InternVL2 (7B)2024-12-06
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling✓ Link44.1InternVL2 (4B)2024-12-06
Long Context Transfer from Language to Vision✓ Link43.6LongVA (7B)2024-06-24
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models✓ Link41.9LLaMA-VID (7B)2023-11-28
[]()39.1MiniCPM-V 2.6 (7B)
VTimeLLM: Empower LLM to Grasp Video Moments✓ Link33.1VTimeLLM (7B)2023-11-30
Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams✓ Link31.2Flash-Vstream (7B)2024-06-12
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding✓ Link30.9MovieChat (7B)2023-07-31
LITA: Language Instructed Temporal-Localization Assistant✓ Link20.4LITA (7B)2024-03-27
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding✓ Link12.8TimeChat (7B)2023-12-04
VideoLLM-online: Online Video Large Language Model for Streaming Video9.6VideoLLM-Online (7B)2024-06-17