OpenCodePapers
phrase-grounding-on-flickr30k-entities-test
Phrase Grounding
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
R@1
↕
R@5
↕
R@10
↕
ModelName
ReleaseDate
↕
GLIPv2: Unifying Localization and Vision-Language Understanding
✓ Link
87.7
GLIPv2
2022-06-12
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
✓ Link
87.4
96.4
97.6
FIBER-B
2022-06-15
Grounded Language-Image Pre-training
✓ Link
87.1
96.9
98.1
GLIP
2021-12-07
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models
✓ Link
84.4
PEVL
2022-05-23
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
✓ Link
84.3
93.9
95.8
MDETR-ENB5
2021-04-26
Disentangled Motif-aware Graph Learning for Phrase Grounding
78.73
DIGN
2021-04-13
Learning Cross-modal Context Graph for Visual Grounding
✓ Link
76.74
LCMCG
2020-02-13
Phrase Grounding by Soft-Label Chain Conditional Random Field
✓ Link
74.69
Soft-Label Chain CRF (SL-CCRF)
2019-09-01
Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding
✓ Link
73.3
DDPN (ResNet-101)
2018-05-09
VisualBERT: A Simple and Performant Baseline for Vision and Language
✓ Link
71.33
84.98
86.51
VisualBERT
2019-08-09
Bilinear Attention Networks
✓ Link
69.69
84.22
86.35
BAN (Bottom-Up detector)
2018-05-21
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
✓ Link
48.69
MCB
2016-06-06
Grounding of Textual Phrases in Images by Reconstruction
✓ Link
48.38
GroundeR 100.0% annot.
2015-11-12
Learning Deep Structure-Preserving Image-Text Embeddings
43.89
64.46
68.66
DSPE
2015-11-19
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
✓ Link
41.77
64.52
70.77
CCA - Fast RCNN
2015-05-19
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
✓ Link
30.83
58.01
67.15
CCA - VGG19
2015-05-19
Natural Language Object Retrieval
✓ Link
27.8
62.9
SCRC
2015-11-13
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
✓ Link
25.30
59.66
CCA
2015-05-19