OpenCodePapers
long-context-understanding-on-mmneedle
Long-Context Understanding
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
1 Image, 4*4 Stitching, Exact Accuracy
↕
1 Image, 8*8 Stitching, Exact Accuracy
↕
1 Image, 2*2 Stitching, Exact Accuracy
↕
10 Images, 1*1 Stitching, Exact Accuracy
↕
10 Images, 2*2 Stitching, Exact Accuracy
↕
10 Images, 4*4 Stitching, Exact Accuracy
↕
10 Images, 8*8 Stitching, Exact Accuracy
↕
ModelName
ReleaseDate
↕
GPT-4 Technical Report
✓ Link
83
19
94.6
97
81.8
26.9
1
GPT-4o
2023-03-15
GPT-4 Technical Report
✓ Link
54.72
7.3
86.09
72.36
34.24
7.58
0
GPT-4V
2023-03-15
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
✓ Link
39.85
29.81
90.34
89.94
45.21
6.09
0.62
Gemini Pro 1.5
2024-03-08
Gemini: A Family of Highly Capable Multimodal Models
✓ Link
24.78
2.11
29.53
16.25
4.82
0.4
0
Gemini Pro 1.0
2023-12-19
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
✓ Link
17.5
3.3
43.8
0
0
0
0
LLaVA-Llama-3
2024-03-18
The Claude 3 Model Family: Opus, Sonnet, Haiku
12.3
1.6
52.25
66.93
4.6
0.4
0
Claude 3 Opus
2024-03-04
What matters when building vision-language models?
7.8
0.9
18.9
0
0
0
0
IDEFICS2-8B
2024-05-03
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
✓ Link
6.2
2.2
3.8
0
0
0
0
InstructBLIP-Flan-T5-XXL
2023-05-11
CogVLM: Visual Expert for Pretrained Language Models
✓ Link
0.9
0.1
7.3
0
0
0
0
CogVLM2-Llama-3
2023-11-06
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
✓ Link
0.3
0.7
1.9
0.4
0.1
0
0
mPLUG-Owl-v2
2023-11-07
CogVLM: Visual Expert for Pretrained Language Models
✓ Link
0.1
0.3
0
0
0
0
0
CogVLM-17B
2023-11-06
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
✓ Link
0
0
0
0
0
0
0
InstructBLIP-Vicuna-13B
2023-05-11