OpenCodePapers

visual-question-answering-vqa-on-vlm2-bench

Visual Question Answering (VQA)
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeGC-matGC-trkOC-cprOC-cntOC-grpPC-cprPC-cntPC-grpPC-VIDAverage Score on VLM2-bench (9 subtasks)ModelNameReleaseDate
GPT-4o System Card37.4539.2774.1780.6257.5050.0090.5047.0066.7560.36GPT-4o2024-10-25
Qwen2.5-VL Technical Report✓ Link35.9143.3871.3941.7247.5080.0057.9869.0046.5054.82Qwen2.5-VL-7B2025-02-19
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling✓ Link30.5030.5943.3351.4852.5059.5059.7061.0021.7545.59InternVL2.5-26B2024-12-06
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution✓ Link27.8019.1868.0645.9935.0061.5058.5949.0016.2542.37Qwen2-VL-7B2024-09-18
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling✓ Link21.2426.0353.3355.2346.5051.5060.0052.005.2541.23InternVL2.5-8B2024-12-06
Video Instruction Tuning With Synthetic Data18.5312.7954.7262.4728.5062.0066.9125.0059.0043.32LLaVA-Video-7B2024-10-03
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models✓ Link17.3718.2649.1762.9731.0063.5058.8626.0013.5037.85mPLUG-Owl3-7B2024-08-09
LLaVA-OneVision: Easy Visual Task Transfer✓ Link16.6013.7047.2256.1727.5062.0046.6737.0047.2539.35LLaVA-OneVision-7B2024-08-06
Long Context Transfer from Language to Vision✓ Link14.2919.1826.6742.5318.5021.5038.9018.003.7522.59LongVA-7B2024-06-24