OpenCodePapers

image-retrieval-on-flickr30k-1k-test

Image Retrieval

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	R@1	R@5	R@10	ModelName	ReleaseDate
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts	✓ Link	86.9	97.3	98.7	X-VLM (base)	2021-11-16
Plug-and-Play Regulators for Image-Text Matching	✓ Link	62.6	85.8	91.1	RCAR	2023-03-23
Similarity Reasoning and Filtration for Image-Text Matching	✓ Link	58.5	83.0	88.8	SGRAF	2021-01-05
VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-words	✓ Link	57.4	82.0	88.1	VisualSparta	2021-01-01
A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval	✓ Link	57.4	84.1	90.2	LGSGM	2021-06-04
Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders	✓ Link	56.5	81.2	88.2	TERAN MrSw	2020-08-12
Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders	✓ Link	55.7	83.1	89.3	TERAN Symm.	2020-08-12
Visual Semantic Reasoning for Image-Text Matching	✓ Link	54.7	81.8	88.2	VSRN	2019-09-06
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval	✓ Link	51.5	77.1	85.3	CAMP	2019-09-12
Stacked Cross Attention for Image-Text Matching	✓ Link	44.0	74.2	82.6	SCAN i-t	2018-03-21
Learning Semantic Concepts and Order for Image and Sentence Matching		41.1	70.5	80.1	SCO	2017-12-06
Dual Attention Networks for Multimodal Reasoning and Matching	✓ Link	39.4	69.2	79.1	DAN	2016-11-02
Linking Image and Text with 2-Way Nets	✓ Link	36.0			2WayNet (VGG)	2016-08-29
Instance-aware Image and Sentence Matching with Selective Multimodal LSTM		30.2		72.3	SM-LSTM (VGG)	2016-11-17
Learning Deep Structure-Preserving Image-Text Embeddings		29.7	60.1	72.1	SPE	2015-11-19
Multimodal Convolutional Neural Networks for Matching Image and Sentence	✓ Link	26.2	56.3	69.6	mCNN	2015-04-23
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models	✓ Link	24.7	53.4	66.8	HGLMM FV	2015-05-19
Deep Visual-Semantic Alignments for Generating Image Descriptions	✓ Link	15.2		50.5	DVSA (R-CNN, AlexNet)	2014-12-07