OpenCodePapers

visual-question-answering-on-vqa-v2-val

Visual Question Answering (VQA)
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyModelNameReleaseDate
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models✓ Link65.2BLIP-2 ViT-G FlanT5 XXL (zero-shot)2023-01-30
Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training✓ Link63.3PNP-VQA2022-10-17
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models✓ Link63.1BLIP-2 ViT-G FlanT5 XL (zero-shot)2023-01-30
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models✓ Link62.6BLIP-2 ViT-L FlanT5 XL (zero-shot)2023-01-30
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models✓ Link54.3BLIP-2 ViT-G OPT 6.7B (zero-shot)2023-01-30
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models✓ Link53.5BLIP-2 ViT-G OPT 2.7B (zero-shot)2023-01-30
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models✓ Link50.1BLIP-2 ViT-L OPT 2.7B (zero-shot)2023-01-30
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models✓ Link47.7Few VLM (zero-shot)2021-10-16
Language Models are General-Purpose Interfaces✓ Link41.1MetaLM2022-06-13
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation38.6VLKD(ViT-B/16)2021-11-16
Multimodal Few-Shot Learning with Frozen Language Models29.5Frozen2021-06-25