OpenCodePapers

visual-question-answering-on-clevr

Visual Question Answering (VQA)

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Accuracy	ModelName	ReleaseDate
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding	✓ Link	99.8	NS-VQA (1K programs)	2018-10-04
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding	✓ Link	99.7	MDETR	2021-04-26
NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional Generalization	✓ Link	99.7	NeSyCoCo	2024-12-20
Interpretable Visual Reasoning via Induced Symbolic Space	✓ Link	99.4	OCCAM (ours)	2020-11-23
Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning	✓ Link	99.1	TbD + reg + hres	2018-03-14
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision	✓ Link	98.9	NS-CL	2019-04-26
Compositional Attention Networks for Machine Reasoning	✓ Link	98.9	MAC	2018-03-08
Learning Visual Question Answering by Bootstrapping Hard Attention	✓ Link	98.8	CNN + LSTM + RN + HAN	2018-08-01
DDRprog: A CLEVR Differentiable Dynamic Reasoning Programmer		98.3	DDRprog*	2018-03-30
Language-Conditioned Graph Networks for Relational Reasoning	✓ Link	97.9	single-hop + LCGN (ours)	2019-05-10
FiLM: Visual Reasoning with a General Conditioning Layer	✓ Link	97.7	CNN+GRU+FiLM	2017-09-22
Explainable and Explicit Visual Reasoning over Scene Graphs	✓ Link	97.7	XNM-Det supervised	2018-12-05
Inferring and Executing Programs for Visual Reasoning	✓ Link	96.9	IEP-700K	2017-05-10
A simple neural network module for relational reasoning	✓ Link	95.50	CNN + LSTM + RN	2017-06-05
Question-Guided Hybrid Convolution for Visual Question Answering		65.90	QGHC+Att+Concat	2018-08-08