OpenCodePapers

image-to-text-retrieval-on-coco

Image-to-Text Retrieval
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeRecall@1Recall@5Recall@10ModelNameReleaseDate
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models✓ Link85.497.098.5BLIP-2 (ViT-G, fine-tuned)2023-01-30
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities✓ Link84.196.398.3ONE-PEACE (ViT-G, w/o ranking)2023-05-18
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models✓ Link83.596.098.0BLIP-2 (ViT-L, fine-tuned)2023-01-30
Learning Relation Alignment for Calibrated Cross-modal Retrieval✓ Link67.7889.794.48IAIS2021-05-28
Learning Transferable Visual Models From Natural Language Supervision✓ Link58.481.588.1CLIP (zero-shot)2021-02-26
FLAVA: A Foundational Language And Vision Alignment Model✓ Link42.7476.76FLAVA (ViT-B, zero-shot)2021-12-08
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks✓ Link99.8Oscar2020-04-13
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training97.2Unicoder-VL2019-08-16
Deep Visual-Semantic Alignments for Generating Image Descriptions✓ Link74.8DVSA2014-12-07