OpenCodePapers

image-captioning-on-nocaps-xd-entire

Image Captioning
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeCIDErB1B2B3B4ROUGE-LMETEORSPICEModelNameReleaseDate
GIT: A Generative Image-to-text Transformer for Vision and Language✓ Link124.7788.4375.0257.8737.6563.1932.5616.06GIT22022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language✓ Link123.3988.174.8157.6837.3563.1232.515.94GIT2022-05-27
Scaling Up Vision-Language Pre-training for Image Captioning114.2585.6271.3653.6234.6561.231.2714.85Microsoft Cognitive Services team2021-11-24
[]()102.3983.6967.9649.3829.6958.9929.6814.71VLAF2
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning100.1282.2766.0447.4828.9558.2629.4714.04Microsoft Cognitive Services team2020-09-28
[]()85.3476.6456.4636.3719.4852.8328.1514.67Human
[]()85.378.7761.5441.8523.7754.5925.9611.84icp2ssi1_coco_si_0.02_5_test
[]()85.0279.1760.2939.0620.8153.3926.5412.74test_cbs2
[]()73.0976.5956.7435.3918.4151.8224.4211.2UpDown + ELMo + CBS
[]()61.4873.4252.1229.3512.8848.7422.069.69Neural Baby Talk + CBS
[]()54.2574.055.1135.2319.1650.9222.9610.14UpDown
[]()53.3672.3352.4230.8314.7348.8721.529.15Neural Baby Talk