visual-question-answering-on-v-bench

Visual Question Answering

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Accuracy	ModelName	ReleaseDate
FOCUS: Internal MLLM Representations for Efficient Fine-Grained Visual Question Answering		92.15	LLaVA-OneVision7B w. FOCUS	2025-06-25
ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration	✓ Link	90.58	LLaVA-OneVision7B w. ZoomEye	2024-11-25
Instruction-Guided Visual Masking	✓ Link	81.2	IVM-Enhanced GPT4-V	2024-05-30
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs	✓ Link	75.39	SEAL	2023-12-21
LLaVA-OneVision: Easy Visual Task Transfer	✓ Link	74.46	LLaVA-OneVision7B	2024-08-06

OpenCodePapers