OpenCodePapers

visual-question-answering-vqa-on-core-mm

Visual Question Answering (VQA)

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Overall score	Deductive	Abductive	Analogical	Params	ModelName	ReleaseDate
GPT-4 Technical Report	✓ Link	74.44	74.86	77.88	69.86		GPT-4V	2023-03-15
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models	✓ Link	39.48	42.17	49.85	20.69	16B	SPHINX v2	2023-11-13
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond	✓ Link	37.39	37.55	44.39	30.42	16B	Qwen-VL-Chat	2023-08-24
CogVLM: Visual Expert for Pretrained Language Models	✓ Link	37.16	36.75	47.88	28.75	17B	CogVLM-Chat	2023-11-06
Improved Baselines with Visual Instruction Tuning	✓ Link	32.62	30.94	47.91	24.31	13B	LLaVA-1.5	2023-10-05
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model	✓ Link	30.46	28.7	46.12	22.08	7B	LLaMA-Adapter V2	2023-04-28
Emu: Generative Pretraining in Multimodality	✓ Link	28.24	28.9	36.57	18.19	14B	Emu	2023-07-11
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning	✓ Link	28.02	27.56	37.76	20.56	8B	InstructBLIP	2023-05-11
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition	✓ Link	26.84	26.77	35.97	18.61	9B	InternLM-XComposer-VL	2023-09-26
Otter: A Multi-Modal Model with In-Context Instruction Tuning	✓ Link	22.69	22.49	33.64	13.33	7B	Otter	2023-05-05
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration	✓ Link	20.05	23.43	20.6	7.64	7B	mPLUG-Owl2	2023-11-07
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models	✓ Link	19.31	2.76	18.96	7.5	3B	BLIP-2-OPT2.7B	2023-01-30
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models	✓ Link	10.43	11.02	13.28	5.69	8B	MiniGPT-v2	2023-04-20
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models	✓ Link	6.82	8.88	5.3	1.11	9B	OpenFlamingo-v2	2023-08-02