OpenCodePapers

image-captioning-on-nocaps-val-overall

Image Captioning
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeCIDErSPICEPretrain (#images)ModelNameReleaseDate
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models✓ Link121.615.81.1BBLIP-2 ViT-G FlanT5 XL (zero-shot)2023-01-30
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models✓ Link121.015.31.1BBLIP-2 ViT-G OPT 6.7B (zero-shot)2023-01-30
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models✓ Link119.715.41.1BBLIP-2 ViT-G OPT 2.7B (zero-shot)2023-01-30
Scaling Up Vision-Language Pre-training for Image Captioning113.4 15.0200MLEMON_large2021-11-24
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation✓ Link 113.214.8129MBLIP_ViT-L2022-01-28
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision✓ Link112.2-1.8BSimVLM2021-08-24
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation✓ Link109.614.7129MBLIP_CapFilt-L2022-01-28
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks107.514.714MOmniVL2022-09-15
VinVL: Revisiting Visual Representations in Vision-Language Models✓ Link 95.5 13.5 5.7MVinVL2021-01-02
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts✓ Link90.212.1Enc-Dec2021-02-17
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks✓ Link80.911.3345MOSCAR2020-04-13