OpenCodePapers
visual-question-answering-on-vqa-v2-val
Visual Question Answering (VQA)
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
Accuracy
↕
ModelName
ReleaseDate
↕
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
✓ Link
65.2
BLIP-2 ViT-G FlanT5 XXL (zero-shot)
2023-01-30
Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training
✓ Link
63.3
PNP-VQA
2022-10-17
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
✓ Link
63.1
BLIP-2 ViT-G FlanT5 XL (zero-shot)
2023-01-30
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
✓ Link
62.6
BLIP-2 ViT-L FlanT5 XL (zero-shot)
2023-01-30
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
✓ Link
54.3
BLIP-2 ViT-G OPT 6.7B (zero-shot)
2023-01-30
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
✓ Link
53.5
BLIP-2 ViT-G OPT 2.7B (zero-shot)
2023-01-30
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
✓ Link
50.1
BLIP-2 ViT-L OPT 2.7B (zero-shot)
2023-01-30
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models
✓ Link
47.7
Few VLM (zero-shot)
2021-10-16
Language Models are General-Purpose Interfaces
✓ Link
41.1
MetaLM
2022-06-13
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation
38.6
VLKD(ViT-B/16)
2021-11-16
Multimodal Few-Shot Learning with Frozen Language Models
29.5
Frozen
2021-06-25