OpenCodePapers
image-captioning-on-nocaps-val-overall
Image Captioning
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
CIDEr
↕
SPICE
↕
Pretrain (#images)
↕
ModelName
ReleaseDate
↕
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
✓ Link
121.6
15.8
1.1B
BLIP-2 ViT-G FlanT5 XL (zero-shot)
2023-01-30
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
✓ Link
121.0
15.3
1.1B
BLIP-2 ViT-G OPT 6.7B (zero-shot)
2023-01-30
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
✓ Link
119.7
15.4
1.1B
BLIP-2 ViT-G OPT 2.7B (zero-shot)
2023-01-30
Scaling Up Vision-Language Pre-training for Image Captioning
113.4
15.0
200M
LEMON_large
2021-11-24
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
✓ Link
113.2
14.8
129M
BLIP_ViT-L
2022-01-28
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
✓ Link
112.2
-
1.8B
SimVLM
2021-08-24
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
✓ Link
109.6
14.7
129M
BLIP_CapFilt-L
2022-01-28
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
107.5
14.7
14M
OmniVL
2022-09-15
VinVL: Revisiting Visual Representations in Vision-Language Models
✓ Link
95.5
13.5
5.7M
VinVL
2021-01-02
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
✓ Link
90.2
12.1
Enc-Dec
2021-02-17
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
✓ Link
80.9
11.3
345M
OSCAR
2020-04-13