OpenCodePapers

visual-question-answering-on-v-bench

Visual Question Answering
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyModelNameReleaseDate
FOCUS: Internal MLLM Representations for Efficient Fine-Grained Visual Question Answering92.15LLaVA-OneVision7B w. FOCUS2025-06-25
ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration✓ Link90.58LLaVA-OneVision7B w. ZoomEye2024-11-25
Instruction-Guided Visual Masking✓ Link81.2IVM-Enhanced GPT4-V2024-05-30
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs✓ Link75.39SEAL2023-12-21
LLaVA-OneVision: Easy Visual Task Transfer✓ Link74.46LLaVA-OneVision7B2024-08-06