OpenCodePapers
image-to-text-retrieval-on-flickr30k
Image-to-Text Retrieval
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
Recall@1
↕
Recall@5
↕
Recall@10
↕
Recall@Sum
↕
ModelName
ReleaseDate
↕
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
✓ Link
97.9
100
100
InternVL-G-FT (finetuned, w/o ranking)
2023-12-21
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
✓ Link
97.6
100
100
BLIP-2 ViT-G (zero-shot, 1K test set)
2023-01-30
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
✓ Link
97.6
100
100
ONE-PEACE (finetuned, w/o ranking)
2023-05-18
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
✓ Link
97.2
100
100
InternVL-C-FT (finetuned, w/o ranking)
2023-12-21
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
✓ Link
96.9
100
100
BLIP-2 ViT-L (zero-shot, 1K test set)
2023-01-30
ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training
✓ Link
96.1
99.9
100.0
ERNIE-ViL 2.0
2022-09-30
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
✓ Link
95.9
99.8
100.0
ALBEF
2021-07-16
HADA: A Graph-based Amalgamation Framework in Image-text Retrieval
✓ Link
92.6
99.3
99.9
ALBEF
2023-01-11
HADA: A Graph-based Amalgamation Framework in Image-text Retrieval
✓ Link
87.3
98
99.2
UNITER
2023-01-11
A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval
✓ Link
76.4
94.3
97.3
268
GSMN
2021-06-04
A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval
✓ Link
71
91.9
96.1
259
LGSGM
2021-06-04