Paper | Code | ADD(S) AUC | ModelName | ReleaseDate |
---|---|---|---|---|
Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored Search | ✓ Link | 91.73 | AlignCMSS | 2023-09-28 |
VinVL: Revisiting Visual Representations in Vision-Language Models | ✓ Link | 88.56 | VinVL | 2021-01-02 |
AdsCVLR: Commercial Visual-Linguistic Representation Modeling in Sponsored Search | 87.90 | AdsCVLR | 2022-10-10 | |
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks | ✓ Link | 87.45 | OSCAR | 2020-04-13 |
VL-BERT: Pre-training of Generic Visual-Linguistic Representations | ✓ Link | 86.27 | VL-BERT | 2019-08-22 |
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation | ✓ Link | 83.51 | BLIP | 2022-01-28 |
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training | 83.16 | Unicoder-VL | 2019-08-16 | |
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation | ✓ Link | 82.74 | ALBEF | 2021-07-16 |