OpenCodePapers

image-captioning-on-coco

Image Captioning
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeCIDErBLEU-1BLEU-2BLEU-3BLEU-4METEORROUGEROUGE-LModelNameReleaseDate
Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning✓ Link143.7ExpansionNet v22022-08-13
Meshed-Memory Transformer for Image Captioning✓ Link131.2M2 Transformer2019-12-17
[]()131.081.165.951.739.929.459.2IGINet
UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning✓ Link127.739.6UNIMO-large2020-12-31
Reflective Decoding Network for Image Captioning125.2RDN2019-08-30
Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects121.1Lyrics2023-12-08
Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning✓ Link11534.758Bit Diffusion (20 steps)2022-08-08
Retrieval-Augmented Multimodal Language Modeling103Flamingo (80B; 4-shot)2022-11-22
Retrieval-Augmented Multimodal Language Modeling89.1RA-CM3 (2.7B)2022-11-22
Retrieval-Augmented Multimodal Language Modeling85Flamingo (3B; 4-shot)2022-11-22
Retrieval-Augmented Multimodal Language Modeling83.9Parti2022-11-22
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features✓ Link77.664.246.333.624.923.149NIC (ResNet-50, CutMix)2019-05-13
Retrieval-Augmented Multimodal Language Modeling71.9Vanilla CM32022-11-22
Retrieval-Augmented Multimodal Language Modeling55.8X-LXMERT2022-11-22
Retrieval-Augmented Multimodal Language Modeling48minDALL-E2022-11-22
Retrieval-Augmented Multimodal Language Modeling38.7ruDALL-E-XL2022-11-22
Retrieval-Augmented Multimodal Language Modeling20.2DALL-E2022-11-22