OpenCodePapers

visual-question-answering-vqa-on-infoseek

Visual Question Answering (VQA)
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyModelNameReleaseDate
PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers✓ Link30.65RA-VQAv2 w/ PreFLMR2024-02-13
PaLI-X: On Scaling up a Multilingual Vision and Language Model✓ Link24PaLI-X2023-05-29
Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?✓ Link20.9CLIP + FiD2023-02-23
Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?✓ Link20.4CLIP + PaLM (540B)2023-02-23
Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?✓ Link19.7PaLI2023-02-23
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models✓ Link14.6BLIP22023-01-30
[]()14.5InstructBLIP