OpenCodePapers

visual-question-answering-on-vqa-v2-val-1

Visual Question Answering
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyModelNameReleaseDate
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models✓ Link82.19BLIP-2 ViT-G OPT 6.7B (fine-tuned)2023-01-30
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models✓ Link81.59BLIP-2 ViT-G OPT 2.7B (fine-tuned)2023-01-30
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models✓ Link81.55BLIP-2 ViT-G FlanT5 XL (fine-tuned)2023-01-30
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs✓ Link55.9LocVLM-L2024-04-11