OpenCodePapers

video-question-answering-on-ovbench

Video Question Answering

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	AVG	ModelName	ReleaseDate
Seed1.5-VL Technical Report		60.0	Seed1.5-VL	2025-05-11
Online Video Understanding: OVBench and VideoChat-Online	✓ Link	54.9	VideoChat-Online (4B)	2024-12-31
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context	✓ Link	50.7	Gemini-1.5-Flash	2024-03-08
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution	✓ Link	49.7	Qwen2-VL (7B)	2024-09-18
LLaVA-OneVision: Easy Visual Task Transfer	✓ Link	49.5	LLaVA-OneVision (7B)	2024-08-06
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling	✓ Link	48.7	InternVL2 (7B)	2024-12-06
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling	✓ Link	44.1	InternVL2 (4B)	2024-12-06
Long Context Transfer from Language to Vision	✓ Link	43.6	LongVA (7B)	2024-06-24
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models	✓ Link	41.9	LLaMA-VID (7B)	2023-11-28
[]()		39.1	MiniCPM-V 2.6 (7B)
VTimeLLM: Empower LLM to Grasp Video Moments	✓ Link	33.1	VTimeLLM (7B)	2023-11-30
Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams	✓ Link	31.2	Flash-Vstream (7B)	2024-06-12
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding	✓ Link	30.9	MovieChat (7B)	2023-07-31
LITA: Language Instructed Temporal-Localization Assistant	✓ Link	20.4	LITA (7B)	2024-03-27
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding	✓ Link	12.8	TimeChat (7B)	2023-12-04
VideoLLM-online: Online Video Large Language Model for Streaming Video		9.6	VideoLLM-Online (7B)	2024-06-17