OpenCodePapers
image-captioning-on-coco
Image Captioning
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
CIDEr
↕
BLEU-1
↕
BLEU-2
↕
BLEU-3
↕
BLEU-4
↕
METEOR
↕
ROUGE
↕
ROUGE-L
↕
ModelName
ReleaseDate
↕
Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning
✓ Link
143.7
ExpansionNet v2
2022-08-13
Meshed-Memory Transformer for Image Captioning
✓ Link
131.2
M2 Transformer
2019-12-17
[]()
131.0
81.1
65.9
51.7
39.9
29.4
59.2
IGINet
UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning
✓ Link
127.7
39.6
UNIMO-large
2020-12-31
Reflective Decoding Network for Image Captioning
125.2
RDN
2019-08-30
Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects
121.1
Lyrics
2023-12-08
Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning
✓ Link
115
34.7
58
Bit Diffusion (20 steps)
2022-08-08
Retrieval-Augmented Multimodal Language Modeling
103
Flamingo (80B; 4-shot)
2022-11-22
Retrieval-Augmented Multimodal Language Modeling
89.1
RA-CM3 (2.7B)
2022-11-22
Retrieval-Augmented Multimodal Language Modeling
85
Flamingo (3B; 4-shot)
2022-11-22
Retrieval-Augmented Multimodal Language Modeling
83.9
Parti
2022-11-22
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
✓ Link
77.6
64.2
46.3
33.6
24.9
23.1
49
NIC (ResNet-50, CutMix)
2019-05-13
Retrieval-Augmented Multimodal Language Modeling
71.9
Vanilla CM3
2022-11-22
Retrieval-Augmented Multimodal Language Modeling
55.8
X-LXMERT
2022-11-22
Retrieval-Augmented Multimodal Language Modeling
48
minDALL-E
2022-11-22
Retrieval-Augmented Multimodal Language Modeling
38.7
ruDALL-E-XL
2022-11-22
Retrieval-Augmented Multimodal Language Modeling
20.2
DALL-E
2022-11-22