OpenCodePapers

image-to-text-retrieval-on-flickr30k

Image-to-Text Retrieval
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeRecall@1Recall@5Recall@10Recall@SumModelNameReleaseDate
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks✓ Link97.9100100InternVL-G-FT (finetuned, w/o ranking)2023-12-21
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models✓ Link97.6100100BLIP-2 ViT-G (zero-shot, 1K test set)2023-01-30
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities✓ Link97.6100100ONE-PEACE (finetuned, w/o ranking)2023-05-18
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks✓ Link97.2100100InternVL-C-FT (finetuned, w/o ranking)2023-12-21
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models✓ Link96.9100100BLIP-2 ViT-L (zero-shot, 1K test set)2023-01-30
ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training✓ Link96.199.9100.0ERNIE-ViL 2.02022-09-30
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation✓ Link95.999.8100.0ALBEF2021-07-16
HADA: A Graph-based Amalgamation Framework in Image-text Retrieval✓ Link92.699.399.9ALBEF2023-01-11
HADA: A Graph-based Amalgamation Framework in Image-text Retrieval✓ Link87.39899.2UNITER2023-01-11
A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval✓ Link76.494.397.3268GSMN2021-06-04
A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval✓ Link7191.996.1259LGSGM2021-06-04