OpenCodePapers

visual-question-answering-on-vcr-q-a-test

Visual Question Answering (VQA)
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyModelNameReleaseDate
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest✓ Link89.4GPT4RoI2023-07-07
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph81.6ERNIE-ViL-large(ensemble of 15 models)2020-06-30
UNITER: UNiversal Image-TExt Representation Learning✓ Link79.8UNITER-large (10 ensemble)2019-09-25
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks79.6MAD (Single Model, Formerly CLIP-TD)2022-04-22
UNITER: UNiversal Image-TExt Representation Learning✓ Link77.3UNITER (Large)2019-09-25
KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for Visual Commonsense Reasoning76.4KVL-BERTLARGE2020-12-13
VL-BERT: Pre-training of Generic Visual-Linguistic Representations✓ Link75.8VL-BERTLARGE2019-08-22
Unifying Vision-and-Language Tasks via Text Generation✓ Link75.3VL-T52021-02-04
VisualBERT: A Simple and Performant Baseline for Vision and Language✓ Link71.6VisualBERT2019-08-09
Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations✓ Link71.2OFA-X2022-12-08
Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations✓ Link62OFA-X-MT2022-12-08