OpenCodePapers

image-captioning-on-coco

Image Captioning

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	CIDEr	BLEU-1	BLEU-2	BLEU-3	BLEU-4	METEOR	ROUGE	ROUGE-L	ModelName	ReleaseDate
Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning	✓ Link	143.7								ExpansionNet v2	2022-08-13
Meshed-Memory Transformer for Image Captioning	✓ Link	131.2								M2 Transformer	2019-12-17
[]()		131.0	81.1	65.9	51.7	39.9	29.4		59.2	IGINet
UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning	✓ Link	127.7				39.6				UNIMO-large	2020-12-31
Reflective Decoding Network for Image Captioning		125.2								RDN	2019-08-30
Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects		121.1								Lyrics	2023-12-08
Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning	✓ Link	115				34.7			58	Bit Diffusion (20 steps)	2022-08-08
Retrieval-Augmented Multimodal Language Modeling		103								Flamingo (80B; 4-shot)	2022-11-22
Retrieval-Augmented Multimodal Language Modeling		89.1								RA-CM3 (2.7B)	2022-11-22
Retrieval-Augmented Multimodal Language Modeling		85								Flamingo (3B; 4-shot)	2022-11-22
Retrieval-Augmented Multimodal Language Modeling		83.9								Parti	2022-11-22
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features	✓ Link	77.6	64.2	46.3	33.6	24.9	23.1	49		NIC (ResNet-50, CutMix)	2019-05-13
Retrieval-Augmented Multimodal Language Modeling		71.9								Vanilla CM3	2022-11-22
Retrieval-Augmented Multimodal Language Modeling		55.8								X-LXMERT	2022-11-22
Retrieval-Augmented Multimodal Language Modeling		48								minDALL-E	2022-11-22
Retrieval-Augmented Multimodal Language Modeling		38.7								ruDALL-E-XL	2022-11-22
Retrieval-Augmented Multimodal Language Modeling		20.2								DALL-E	2022-11-22