OpenCodePapers

long-context-understanding-on-mmneedle

Long-Context Understanding
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCode1 Image, 4*4 Stitching, Exact Accuracy1 Image, 8*8 Stitching, Exact Accuracy1 Image, 2*2 Stitching, Exact Accuracy10 Images, 1*1 Stitching, Exact Accuracy10 Images, 2*2 Stitching, Exact Accuracy10 Images, 4*4 Stitching, Exact Accuracy10 Images, 8*8 Stitching, Exact AccuracyModelNameReleaseDate
GPT-4 Technical Report✓ Link831994.69781.826.91GPT-4o2023-03-15
GPT-4 Technical Report✓ Link54.727.386.0972.3634.247.580GPT-4V2023-03-15
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context✓ Link39.8529.8190.3489.9445.216.090.62Gemini Pro 1.52024-03-08
Gemini: A Family of Highly Capable Multimodal Models✓ Link24.782.1129.5316.254.820.40Gemini Pro 1.02023-12-19
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images✓ Link17.53.343.80000LLaVA-Llama-32024-03-18
The Claude 3 Model Family: Opus, Sonnet, Haiku12.31.652.2566.934.60.40Claude 3 Opus2024-03-04
What matters when building vision-language models?7.80.918.90000IDEFICS2-8B2024-05-03
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning✓ Link6.22.23.80000InstructBLIP-Flan-T5-XXL2023-05-11
CogVLM: Visual Expert for Pretrained Language Models✓ Link0.90.17.30000CogVLM2-Llama-32023-11-06
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration✓ Link0.30.71.90.40.100mPLUG-Owl-v22023-11-07
CogVLM: Visual Expert for Pretrained Language Models✓ Link0.10.300000CogVLM-17B2023-11-06
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning✓ Link0000000InstructBLIP-Vicuna-13B2023-05-11