OpenCodePapers

long-context-understanding-on-mmneedle

Long-Context Understanding

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	1 Image, 4*4 Stitching, Exact Accuracy	1 Image, 8*8 Stitching, Exact Accuracy	1 Image, 2*2 Stitching, Exact Accuracy	10 Images, 1*1 Stitching, Exact Accuracy	10 Images, 2*2 Stitching, Exact Accuracy	10 Images, 4*4 Stitching, Exact Accuracy	10 Images, 8*8 Stitching, Exact Accuracy	ModelName	ReleaseDate
GPT-4 Technical Report	✓ Link	83	19	94.6	97	81.8	26.9	1	GPT-4o	2023-03-15
GPT-4 Technical Report	✓ Link	54.72	7.3	86.09	72.36	34.24	7.58	0	GPT-4V	2023-03-15
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context	✓ Link	39.85	29.81	90.34	89.94	45.21	6.09	0.62	Gemini Pro 1.5	2024-03-08
Gemini: A Family of Highly Capable Multimodal Models	✓ Link	24.78	2.11	29.53	16.25	4.82	0.4	0	Gemini Pro 1.0	2023-12-19
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images	✓ Link	17.5	3.3	43.8	0	0	0	0	LLaVA-Llama-3	2024-03-18
The Claude 3 Model Family: Opus, Sonnet, Haiku		12.3	1.6	52.25	66.93	4.6	0.4	0	Claude 3 Opus	2024-03-04
What matters when building vision-language models?		7.8	0.9	18.9	0	0	0	0	IDEFICS2-8B	2024-05-03
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning	✓ Link	6.2	2.2	3.8	0	0	0	0	InstructBLIP-Flan-T5-XXL	2023-05-11
CogVLM: Visual Expert for Pretrained Language Models	✓ Link	0.9	0.1	7.3	0	0	0	0	CogVLM2-Llama-3	2023-11-06
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration	✓ Link	0.3	0.7	1.9	0.4	0.1	0	0	mPLUG-Owl-v2	2023-11-07
CogVLM: Visual Expert for Pretrained Language Models	✓ Link	0.1	0.3	0	0	0	0	0	CogVLM-17B	2023-11-06
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning	✓ Link	0	0	0	0	0	0	0	InstructBLIP-Vicuna-13B	2023-05-11