OpenCodePapers
visual-reasoning-on-nlvr2-dev
Visual Reasoning
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
Accuracy
↕
ModelName
ReleaseDate
↕
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
✓ Link
91.51
BEiT-3
2022-08-22
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
✓ Link
88.7
X2-VLM (large)
2022-11-22
Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks
✓ Link
87.6
XFM (base)
2023-01-12
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
✓ Link
86.2
X2-VLM (base)
2022-11-22
CoCa: Contrastive Captioners are Image-Text Foundation Models
✓ Link
86.1
CoCa
2022-05-04
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
✓ Link
85.64
VLMo
2021-11-03
Implicit Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis
✓ Link
84.6
VK-OOD
2023-09-21
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
✓ Link
84.53
SimVLM
2021-08-24
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
✓ Link
84.41
X-VLM (base)
2021-11-16
Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis
✓ Link
83.9
VK-OOD
2023-02-11
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
✓ Link
83.14
ALBEF (14M)
2021-07-16
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
✓ Link
76.37
SOHO
2021-04-07
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
✓ Link
75.7
ViLT-B/32
2021-02-05
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
✓ Link
74.9
LXMERT (Pre-train + scratch)
2019-08-20
VisualBERT: A Simple and Performant Baseline for Vision and Language
✓ Link
66.7
VisualBERT
2019-08-09