OpenCodePapers

visual-question-answering-on-a-okvqa

Visual Question Answering (VQA)
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeMC AccuracyDA VQA ScoreModelNameReleaseDate
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts83.7570.55SMoLA-PaLI-X Specialist Model2023-12-01
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models80.468.2PaLI-X-VPD2023-12-05
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering✓ Link75.158.5Prophet2023-03-03
PromptCap: Prompt-Guided Task-Aware Image Captioning✓ Link73.259.6PromptCap2022-11-15
Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training✓ Link71MC-CoT2023-11-23
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning✓ Link56.35HYDRA2024-03-19
Webly Supervised Concept Expansion for General Purpose Vision Models53.740.7GPV-22022-02-04
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA42.242.2KRISP2020-12-20
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks✓ Link42.112.0ViLBERT - VQA2019-08-06
LXMERT: Learning Cross-Modality Encoder Representations from Transformers✓ Link41.625.9LXMERT2019-08-20
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks✓ Link41.525.9ViLBERT2019-08-06
Pythia v0.1: the Winning Entry to the VQA Challenge 2018✓ Link40.121.9Pythia2018-07-26
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks✓ Link34.19.2ViLBERT - OK-VQA2019-08-06
A Simple Baseline for Knowledge-Based Visual Question Answering57.5A Simple Baseline for KB-VQA2023-10-20
VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge✓ Link38.05VLC-BERT2022-10-24