OpenCodePapers
visual-question-answering-vqa-on-core-mm
Visual Question Answering (VQA)
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
Overall score
↕
Deductive
↕
Abductive
↕
Analogical
↕
Params
↕
ModelName
ReleaseDate
↕
GPT-4 Technical Report
✓ Link
74.44
74.86
77.88
69.86
GPT-4V
2023-03-15
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
✓ Link
39.48
42.17
49.85
20.69
16B
SPHINX v2
2023-11-13
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
✓ Link
37.39
37.55
44.39
30.42
16B
Qwen-VL-Chat
2023-08-24
CogVLM: Visual Expert for Pretrained Language Models
✓ Link
37.16
36.75
47.88
28.75
17B
CogVLM-Chat
2023-11-06
Improved Baselines with Visual Instruction Tuning
✓ Link
32.62
30.94
47.91
24.31
13B
LLaVA-1.5
2023-10-05
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
✓ Link
30.46
28.7
46.12
22.08
7B
LLaMA-Adapter V2
2023-04-28
Emu: Generative Pretraining in Multimodality
✓ Link
28.24
28.9
36.57
18.19
14B
Emu
2023-07-11
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
✓ Link
28.02
27.56
37.76
20.56
8B
InstructBLIP
2023-05-11
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
✓ Link
26.84
26.77
35.97
18.61
9B
InternLM-XComposer-VL
2023-09-26
Otter: A Multi-Modal Model with In-Context Instruction Tuning
✓ Link
22.69
22.49
33.64
13.33
7B
Otter
2023-05-05
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
✓ Link
20.05
23.43
20.6
7.64
7B
mPLUG-Owl2
2023-11-07
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
✓ Link
19.31
2.76
18.96
7.5
3B
BLIP-2-OPT2.7B
2023-01-30
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
✓ Link
10.43
11.02
13.28
5.69
8B
MiniGPT-v2
2023-04-20
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
✓ Link
6.82
8.88
5.3
1.11
9B
OpenFlamingo-v2
2023-08-02