OpenCodePapers
visual-question-answering-on-vcr-q-a-test
Visual Question Answering (VQA)
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
Accuracy
↕
ModelName
ReleaseDate
↕
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
✓ Link
89.4
GPT4RoI
2023-07-07
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
81.6
ERNIE-ViL-large(ensemble of 15 models)
2020-06-30
UNITER: UNiversal Image-TExt Representation Learning
✓ Link
79.8
UNITER-large (10 ensemble)
2019-09-25
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
79.6
MAD (Single Model, Formerly CLIP-TD)
2022-04-22
UNITER: UNiversal Image-TExt Representation Learning
✓ Link
77.3
UNITER (Large)
2019-09-25
KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for Visual Commonsense Reasoning
76.4
KVL-BERTLARGE
2020-12-13
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
✓ Link
75.8
VL-BERTLARGE
2019-08-22
Unifying Vision-and-Language Tasks via Text Generation
✓ Link
75.3
VL-T5
2021-02-04
VisualBERT: A Simple and Performant Baseline for Vision and Language
✓ Link
71.6
VisualBERT
2019-08-09
Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations
✓ Link
71.2
OFA-X
2022-12-08
Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations
✓ Link
62
OFA-X-MT
2022-12-08