OpenCodePapers
visual-question-answering-on-benchlmm
Visual Question Answering
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
GPT-3.5 score
↕
ModelName
ReleaseDate
↕
GPT-4 Technical Report
✓ Link
58.37
GPT-4V
2023-03-15
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
✓ Link
57.43
Sphinx-V2-1K
2023-11-13
Improved Baselines with Visual Instruction Tuning
✓ Link
55.53
LLaVA-1.5-13B
2023-10-05
Visual Instruction Tuning
✓ Link
46.83
LLaVA-1.5-7B
2023-04-17
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
✓ Link
45.03
InstructBLIP-13B
2023-05-11
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
✓ Link
44.63
InstructBLIP-7B
2023-05-11
Visual Instruction Tuning
✓ Link
43.50
LLaVA-1-13B
2023-04-17
Otter: A Multi-Modal Model with In-Context Instruction Tuning
✓ Link
39.13
Otter-7B
2023-05-05
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
✓ Link
34.93
MiniGPT4-13B
2023-04-20
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
✓ Link
30.1
MiniGPTv2-7B
2023-10-14