Paper | Code | Accuracy | ModelName | ReleaseDate |
---|---|---|---|---|
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest | ✓ Link | 81.6 | GPT4RoI | 2023-07-07 |
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph | 70.5 | ERNIE-ViL-large(ensemble of 15 models) | 2020-06-30 | |
UNITER: UNiversal Image-TExt Representation Learning | ✓ Link | 62.8 | UNITER (Large) | 2019-09-25 |
KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for Visual Commonsense Reasoning | 60.3 | KVL-BERTLARGE | 2020-12-13 | |
VL-BERT: Pre-training of Generic Visual-Linguistic Representations | ✓ Link | 59.7 | VL-BERTLARGE | 2019-08-22 |
Unifying Vision-and-Language Tasks via Text Generation | ✓ Link | 58.9 | VL-T5 | 2021-02-04 |
VisualBERT: A Simple and Performant Baseline for Vision and Language | ✓ Link | 52.4 | VisualBERT | 2019-08-09 |