OpenCodePapers

visual-question-answering-vqa-on-vlm2-bench

Visual Question Answering (VQA)

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	GC-mat	GC-trk	OC-cpr	OC-cnt	OC-grp	PC-cpr	PC-cnt	PC-grp	PC-VID	Average Score on VLM2-bench (9 subtasks)	ModelName	ReleaseDate
GPT-4o System Card		37.45	39.27	74.17	80.62	57.50	50.00	90.50	47.00	66.75	60.36	GPT-4o	2024-10-25
Qwen2.5-VL Technical Report	✓ Link	35.91	43.38	71.39	41.72	47.50	80.00	57.98	69.00	46.50	54.82	Qwen2.5-VL-7B	2025-02-19
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling	✓ Link	30.50	30.59	43.33	51.48	52.50	59.50	59.70	61.00	21.75	45.59	InternVL2.5-26B	2024-12-06
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution	✓ Link	27.80	19.18	68.06	45.99	35.00	61.50	58.59	49.00	16.25	42.37	Qwen2-VL-7B	2024-09-18
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling	✓ Link	21.24	26.03	53.33	55.23	46.50	51.50	60.00	52.00	5.25	41.23	InternVL2.5-8B	2024-12-06
Video Instruction Tuning With Synthetic Data		18.53	12.79	54.72	62.47	28.50	62.00	66.91	25.00	59.00	43.32	LLaVA-Video-7B	2024-10-03
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models	✓ Link	17.37	18.26	49.17	62.97	31.00	63.50	58.86	26.00	13.50	37.85	mPLUG-Owl3-7B	2024-08-09
LLaVA-OneVision: Easy Visual Task Transfer	✓ Link	16.60	13.70	47.22	56.17	27.50	62.00	46.67	37.00	47.25	39.35	LLaVA-OneVision-7B	2024-08-06
Long Context Transfer from Language to Vision	✓ Link	14.29	19.18	26.67	42.53	18.50	21.50	38.90	18.00	3.75	22.59	LongVA-7B	2024-06-24