OpenCodePapers

image-to-text-retrieval-on-flickr30k

Image-to-Text Retrieval

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Recall@1	Recall@5	Recall@10	Recall@Sum	ModelName	ReleaseDate
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks	✓ Link	97.9	100	100		InternVL-G-FT (finetuned, w/o ranking)	2023-12-21
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models	✓ Link	97.6	100	100		BLIP-2 ViT-G (zero-shot, 1K test set)	2023-01-30
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities	✓ Link	97.6	100	100		ONE-PEACE (finetuned, w/o ranking)	2023-05-18
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks	✓ Link	97.2	100	100		InternVL-C-FT (finetuned, w/o ranking)	2023-12-21
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models	✓ Link	96.9	100	100		BLIP-2 ViT-L (zero-shot, 1K test set)	2023-01-30
ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training	✓ Link	96.1	99.9	100.0		ERNIE-ViL 2.0	2022-09-30
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation	✓ Link	95.9	99.8	100.0		ALBEF	2021-07-16
HADA: A Graph-based Amalgamation Framework in Image-text Retrieval	✓ Link	92.6	99.3	99.9		ALBEF	2023-01-11
HADA: A Graph-based Amalgamation Framework in Image-text Retrieval	✓ Link	87.3	98	99.2		UNITER	2023-01-11
A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval	✓ Link	76.4	94.3	97.3	268	GSMN	2021-06-04
A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval	✓ Link	71	91.9	96.1	259	LGSGM	2021-06-04