OpenCodePapers
image-to-text-retrieval-on-coco
Image-to-Text Retrieval
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
Recall@1
↕
Recall@5
↕
Recall@10
↕
ModelName
ReleaseDate
↕
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
✓ Link
85.4
97.0
98.5
BLIP-2 (ViT-G, fine-tuned)
2023-01-30
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
✓ Link
84.1
96.3
98.3
ONE-PEACE (ViT-G, w/o ranking)
2023-05-18
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
✓ Link
83.5
96.0
98.0
BLIP-2 (ViT-L, fine-tuned)
2023-01-30
Learning Relation Alignment for Calibrated Cross-modal Retrieval
✓ Link
67.78
89.7
94.48
IAIS
2021-05-28
Learning Transferable Visual Models From Natural Language Supervision
✓ Link
58.4
81.5
88.1
CLIP (zero-shot)
2021-02-26
FLAVA: A Foundational Language And Vision Alignment Model
✓ Link
42.74
76.76
FLAVA (ViT-B, zero-shot)
2021-12-08
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
✓ Link
99.8
Oscar
2020-04-13
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
97.2
Unicoder-VL
2019-08-16
Deep Visual-Semantic Alignments for Generating Image Descriptions
✓ Link
74.8
DVSA
2014-12-07