OpenCodePapers

image-to-text-retrieval-on-coco

Image-to-Text Retrieval

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Recall@1	Recall@5	Recall@10	ModelName	ReleaseDate
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models	✓ Link	85.4	97.0	98.5	BLIP-2 (ViT-G, fine-tuned)	2023-01-30
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities	✓ Link	84.1	96.3	98.3	ONE-PEACE (ViT-G, w/o ranking)	2023-05-18
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models	✓ Link	83.5	96.0	98.0	BLIP-2 (ViT-L, fine-tuned)	2023-01-30
Learning Relation Alignment for Calibrated Cross-modal Retrieval	✓ Link	67.78	89.7	94.48	IAIS	2021-05-28
Learning Transferable Visual Models From Natural Language Supervision	✓ Link	58.4	81.5	88.1	CLIP (zero-shot)	2021-02-26
FLAVA: A Foundational Language And Vision Alignment Model	✓ Link	42.74	76.76		FLAVA (ViT-B, zero-shot)	2021-12-08
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks	✓ Link			99.8	Oscar	2020-04-13
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training				97.2	Unicoder-VL	2019-08-16
Deep Visual-Semantic Alignments for Generating Image Descriptions	✓ Link			74.8	DVSA	2014-12-07